Text Mining

TwitterFacebook
Get flash to fully experience Pearltrees
Carrot 2 is an Open Source Search Results Clustering Engine . It can automatically organize small collections of documents (search results but not only) into thematic categories. Search results clustered with Carrot 2 ( live demo )

Carrot2 - Open Source Search Results Clustering Engine

http://project.carrot2.org/

Rapid - I - RapidMiner

http://rapid-i.com/content/view/181/190/ "Thank you so much for a great product and great support. I am very pleased with this support package so far, it has increased my productivity amazingly."
http://www.web-datamining.net/ The term Web Data Mining is a technique used to crawl through various web resources to collect required information, which enables an individual or a company to promote business, understanding marketing dynamics, new promotions floating on the Internet, etc.

Web Data Mining - An Introduction

Bixo Labs has merged with Scale Unlimited and is now providing complete consulting and training services for a wide range of big data problems, including web crawling, data mining and search. During client engagements, we repeatedly saw the need for mentoring and training to ensure a smooth hand-off of projects to internal team members. In addition, Bixo Labs has already been teaching Hadoop classes under contract with Scale Unlimited. Given our increased focus on mentoring and training, it made sense for Bixo Labs to acquire and merge with Scale Unlimited, to provide complete consulting solutions that include support for bringing our clients’ internal staff up-to-speed on the open source technologies we use, such as Hadoop, Solr and Cascading. http://www.scaleunlimited.com/bixo-labs-merger/

Open Source Data Mining Tools | Elastic Web Mining | Bixo Labs

Web-Harvest Project Home Page

Web-Harvest is Open Source Web Data Extraction tool written in Java. It offers a way to collect desired Web pages and extract useful data from them. http://web-harvest.sourceforge.net/

Scrapy | An open source web scraping framework for Python

Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing. Scrapy was designed with extensibility in mind and so it provides several mechanisms to plug new code without having to touch the framework core http://scrapy.org/

Web Data Harvesting: Web Scraping Software

http://amchamni.org/2010/02/web-scraping-software.html Web scraping software provides customer information, marketing information, and competitor information. Businesses develop a closer relationship with their customers by discovering what products are selling, what product defects have been encountered, what consumers like or dislike about a product, or what particular group of customers favor a product. The software directs companies as to which decisions to make as it analyzes how they stand in relation to their competitors or they gain knowledge of current or upcoming trends. Price comparisons, buying and selling trends, and consumer logistics are all data options that can be gathered, stored, analyzed and implemented into profitable business platforms.

DocFetcher - Recherche rapide de documents

DocFetcher est une logiciel libre de recherche de bureau : il permet de rechercher des informations à l'intérieur des documents enregistrés sur votre ordinateur. Autrement dit, c'est un moteur de recherche comme Google, mais pour vos documents personnels. Le logiciel est actuellement disponible pour Windows et Linux. DocFetcher crée des index à partir des fichiers. La recherche s'effectue ensuite en une fraction de seconde à partir de ces index. Vous pouvez soit créer des index permanents pour les dossiers comprenant un grand nombre de fichiers rarement modifiés ou des index temporaires pour les petits dossiers qui contiennent des fichiers qui changent souvent. http://docfetcher.sourceforge.net/fr/index.html