Screen Scraping

TwitterFacebook
Get flash to fully experience Pearltrees
Stateful programmatic web browsing in Python, after Andy Lester’s Perl module WWW::Mechanize . The examples below are written for a website that does not exist ( example.com ), so cannot be run. http://wwwsearch.sourceforge.net/mechanize/

mechanize

mechanize – Writing Bots in Python Made Simple by Guy Rutenberg

http://www.guyrutenberg.com/2012/03/29/mechanize-writing-bots-in-python-made-simple/ I’ve been using python to write various bots and crawler for a long time. Few days ago I needed to write some simple bot to remove some 400+ spam pages in Sikumuna , I took an old script of mine (from 2006) in order to modify it. The script used ClientForm, a python module that allows you to easily parse and fill html forms using python. I quickly found that ClientForm is now deprecated in favor of mechanize .
Full API documentation is in the docstrings and the documentation of urllib2 . The documentation in these web pages is in need of reorganisation at the moment, after the merge of ClientCookie and ClientForm into mechanize. Tests and examples

mechanize — Documentation

http://wwwsearch.sourceforge.net/mechanize/documentation.html#examples
scrape.py is a Python module for scraping content from webpages. Using it, you can easily fetch pages, follow links, and submit forms. Cookies, redirections, and SSL are handled automatically. (For SSL, you either need a version of Python with the socket.ssl function, or the curl command-line utility.) http://zesty.ca/scrape/

scrape.py

Webscraping with Python

WebScraping

Documentation / 3rd party libraries

Back to contents Shared Python Ruby Choose a language: ScraperWiki supports a number of 3rd party Python libraries that we recommend for screen scraping, data analysis and data visualisation. https://scraperwiki.com/docs/python/python_libraries/