background preloader

Web-Harvest Project Home Page

Web-Harvest Project Home Page

Web Data Harvesting: Web Scraping Software Web scraping software is an innovative tool that makes gathering lots of information relatively easy. The program has numerous implications for anyone who has the need to search for comparable information from various locations and put it into usable context. This method of finding extensive information in a short period of time is cost effective. Applications are used everyday for business, medicine, meteorology, government, and law enforcement. The software is user friendly and can be operated by anyone from non-tech data collectors to experienced Web designers. Programs are available for purchase in stores or online. A user enters the software and begins by programming an “agent”, this is the tool that will retrieve any and all information. Web scraping software provides customer information, marketing information, and competitor information. There have been legal ramifications as some have complained about intrusion and copyright infringement. Screen Scraper

IRobotSoft -- Visual Web Scraping and Web Automation Tool for FREE GoodRelations - GoodRelations is a lightweight ontology for annotating offerings and other aspects of e-commerce on the Web. GoodRelations is the only OWL DL ontology officially supported by both Google and Yahoo. It provides a standard vocabulary for expressing things like that a particular Web site describes an offer to sell cellphones of a certain make and model at a certain price, that a pianohouse offers maintenance for pianos that weigh less than 150 kg, or that a car rental company leases out cars of a certain make and model from a particular set of branches across the country. Also, most if not all commercial and functional details of e-commerce scenarios can be expressed, e.g. eligible countries, payment and delivery options, quantity discounts, opening hours, etc. The GoodRelations ontology is available under the Creative Commons Attribution 3.0 license. <?

Web Data Mining - An Introduction Open Source Data Mining Tools | Elastic Web Mining | Bixo Labs Below is a report on the open source data mining tools session at the ACM data mining unconference this past Sunday (01 Nov 2009). This only covers tools that the panelists had used, so it’s not a survey of the available tools. See Jeff Dalton’s blog post on Java Open Source NLP and Text Mining tools for an example of a more complete list of a closely related group of tools. Weka Paul O’Rorke talked about Weka, a collection of machine learning algorithms for data mining tasks. An attendee mentioned MOA. R Language David Smith talked about R. Attendee asked about comparing Matlab & R, with respect to viability in a production environment. Attendee said many people use R for prototyping and generating models, but production uses something else. Paul mentioned that R provides a very compact representation of data mining tasks. Nicolas Cebron talked about KNIME (pronounced “naim”), a modular data exploration platform. Attendee asked about long-term viability of KNIME. Mahout Hadoop Bixo

Gapminder Linked Data | Linked Data - Connect Distributed Data across the Web Wolfram|Alpha Carrot2 - Open Source Search Results Clustering Engine