Web Scraping
< 3rd Party Libraries
< Python
< Languages
< Programmation
< Informatique/CS
< sterops
Get flash to fully experience Pearltrees
Robots or bots are automatic processes that interact with Wikipedia (and other Wikimedia projects) as though they were human editors. This page attempts to explain how to carry out the development of a bot for use on Wikimedia projects and much of this is transferable to other wikis based on MediaWiki. The explanation is geared mainly towards those who have some prior programming experience, but are unsure of how to apply this knowledge to creating a Wikipedia bot. [ edit ] Why would I need to create a bot?
You didn't write that awful page. You're just trying to get some data out of it. Beautiful Soup is here to help. Since 2004, it's been saving programmers hours or days of work on quick-turnaround screen scraping projects. If you have questions, send them to the discussion group .
Stateful programmatic web browsing in Python, after Andy Lester’s Perl module WWW::Mechanize . The examples below are written for a website that does not exist ( example.com ), so cannot be run. There are also some working examples that you can run. import re import mechanize br = mechanize.Browser() br. open ( "http://www.example.com/" ) # follow second link with element text matching regular expression response1 = br.follow_link(text_regex= r"cheese\s*shop" , nr= 1 ) assert br.viewing_html() print br.title() print response1.geturl() print response1.info() # headers print response1.read() # body br.select_form(name= "order" ) # Browser passes through unknown attributes (including methods) # to the selected HTMLForm. br[ "cheeses" ] = [ "mozzarella" , "caerphilly" ] # (the method here is __setitem__) # Submit current form.