background preloader

Text Mining

Facebook Twitter

Carrot2 - Open Source Search Results Clustering Engine. I - RapidMiner. Web Data Mining - An Introduction. Open Source Data Mining Tools | Elastic Web Mining | Bixo Labs. Below is a report on the open source data mining tools session at the ACM data mining unconference this past Sunday (01 Nov 2009). This only covers tools that the panelists had used, so it’s not a survey of the available tools. See Jeff Dalton’s blog post on Java Open Source NLP and Text Mining tools for an example of a more complete list of a closely related group of tools. Weka Paul O’Rorke talked about Weka, a collection of machine learning algorithms for data mining tasks. Concerns about whether it’s still viable. One person said that pieces of it are still viable for clustering, feature selection. An attendee mentioned MOA. R Language David Smith talked about R. Attendee asked about comparing Matlab & R, with respect to viability in a production environment.

Attendee said many people use R for prototyping and generating models, but production uses something else. Paul mentioned that R provides a very compact representation of data mining tasks. Mahout Hadoop Bixo. Web-Harvest Project Home Page. Scrapy | An open source web scraping framework for Python. Web Data Harvesting: Web Scraping Software. Web scraping software is an innovative tool that makes gathering lots of information relatively easy. The program has numerous implications for anyone who has the need to search for comparable information from various locations and put it into usable context. This method of finding extensive information in a short period of time is cost effective. Applications are used everyday for business, medicine, meteorology, government, and law enforcement. The software is user friendly and can be operated by anyone from non-tech data collectors to experienced Web designers. Programs are available for purchase in stores or online.

A user enters the software and begins by programming an “agent”, this is the tool that will retrieve any and all information. Web scraping software provides customer information, marketing information, and competitor information. There have been legal ramifications as some have complained about intrusion and copyright infringement. Screen Scraper.