background preloader


Stateful programmatic web browsing in Python, after Andy Lester’s Perl module WWW::Mechanize. The examples below are written for a website that does not exist (, so cannot be run. There are also some working examples that you can run. import reimport mechanize br = mechanize.Browser()" follow second link with element text matching regular expressionresponse1 = br.follow_link(text_regex=r"cheese\s*shop", nr=1)assert br.viewing_html()print br.title()print response1.geturl()print # headersprint # body br.select_form(name="order")# Browser passes through unknown attributes (including methods)# to the selected["cheeses"] = ["mozzarella", "caerphilly"] # (the method here is __setitem__)# Submit current form. # print currently selected form (don't call .submit() on this, use br.submit())print br.form You may control the browser’s policy by using the methods of mechanize.Browser’s base class, mechanize.UserAgent.

Related:  Python Forum ScrapingWebscraping with PythonpythonPython

Category:LanguageBindings -> PySide EnglishEspañolMagyarItalian한국어日本語 Welcome to the PySide documentation wiki page. The PySide project provides LGPL-licensed Python bindings for the Qt. It also includes complete toolchain for rapidly generating bindings for any Qt-based C++ class hierarchies. PySide Qt bindings allow both free open source and proprietary software development and ultimately aim to support Qt platforms. mechanize — Documentation Full API documentation is in the docstrings and the documentation of urllib2. The documentation in these web pages is in need of reorganisation at the moment, after the merge of ClientCookie and ClientForm into mechanize. Tests and examples Examples

Custom template tags and filters Django’s template language comes with a wide variety of built-in tags and filters designed to address the presentation logic needs of your application. Nevertheless, you may find yourself needing functionality that is not covered by the core set of template primitives. You can extend the template engine by defining custom tags and filters using Python, and then make them available to your templates using the {% load %} tag. Code layout¶ The most common place to specify custom template tags and filters is inside a Django app. If they relate to an existing app, it makes sense to bundle them there; otherwise, they can be added to a new app.

Requests and Responses — Scrapy 0.14.4 documentation Using FormRequest to send data via HTTP POST¶ If you want to simulate a HTML Form POST in your spider and send a couple of key-value fields, you can return a FormRequest object (from your spider) like this: is a Python module for scraping content from webpages. Using it, you can easily fetch pages, follow links, and submit forms. Cookies, redirections, and SSL are handled automatically. (For SSL, you either need a version of Python with the socket.ssl function, or the curl command-line utility.) does not parse the page into a complete parse tree, so it can handle pages with sloppy syntax. Django and AJAX Form Submissions - say 'goodbye' to the page refresh This is a collaboration piece between Real Python and the mighty Nathan Nichols, using a collaborative method we have dubbed ‘agile blogging`. Say ‘hi’ @natsamnic. Let’s get down to business:

Weblogs Forum - Screen Scraping With Python Summary Web-enabling an old terminal-oriented application turns into more fun than expected. A blow-by-blow account of writing a screen scraper with Python and pexpect. I recently finished a project for a local freight broker. They run their business on an old SCO Unix-based "green screen" terminal application. They wanted to enable some functionality on their web site, so customers could track their shipments, and carriers could update their location and status. python recipe: grab page, scrape table, download file . palewire Here's a change of pace. Our first few lessons focused on how you can use Python to goof with a bunch of local files. This time we're going to try something different: using Python to go online and screw around with the Web.

Django tips: A simple AJAX example, part 1 One thing that’s come up over and over again in the Django IRC channel and on the mailing lists is the need for good examples of “how to do AJAX with Django”. Now, one of my goals in life at the moment is to try to fill in the gaps in Django’s documentation, so… Over the next couple of entries we’re going to walk through a very simple form, which will submit via AJAX to a Django view and handle the result without a page refresh. Pattern Pattern is a web mining module for the Python programming language. It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and <canvas> visualization. The module is free, well-document and bundled with 50+ examples and 350+ unit tests. Download Installation Pattern is written for Python 2.5+ (no support for Python 3 yet).

mechanize – Writing Bots in Python Made Simple by Guy Rutenberg I’ve been using python to write various bots and crawler for a long time. Few days ago I needed to write some simple bot to remove some 400+ spam pages in Sikumuna, I took an old script of mine (from 2006) in order to modify it. The script used ClientForm, a python module that allows you to easily parse and fill html forms using python. I quickly found that ClientForm is now deprecated in favor of mechanize. In the beginning I was partly set back by the change, as ClientForm was pretty easy to use, and mechanize‘s documentation could use some improvement.