background preloader

Scraping

Facebook Twitter

Project SIKULI. Screen-scraper.com. Mape/node-scraper. Web Scraping with Perl. We need to scrape data (web scraping) from some websites with Perl for a school project.

Web Scraping with Perl

Here is a simple script that I used to test the Web-Scraper package that can be found in CPAN. This is how the code works: First you have to find a website that contains your data that you want. I used the UCI ProTour website: If you look at the source code you will notice that my data is stored in a table: An open source web scraping framework for Python. PhantomJS: Headless WebKit with JavaScript API. Scraping the web with Node.io. Node.io is a relatively new screen scraping framework that allows you to easily scrape data from websites using Javascript, a language that I think is perfectly suited to the task.

Scraping the web with Node.io

It's built on top of Node.js, but you don't need to know any Node.js to get started, and can run your node.io jobs straight from the command line. The existing documentation is pretty good, and includes a few detailed examples, such as the one below that returns the number of google search results for some given keywords: Running this from the command line gives you the following output: Data Extraction, Web Screen Scraping Tool, Mozenda Scraper. Screen scraping & UI automation solutions for desktop and web.

Refine, reuse and request data.