background preloader

Web python

Facebook Twitter

Crawl a website with scrapy - *.isBullsh.it. In this article, we are going to see how to scrape information from a website, in particular, from all pages with a common URL pattern.

Crawl a website with scrapy - *.isBullsh.it

We will see how to do that with Scrapy, a very powerful, and yet simple, scraping and web-crawling framework. For example, you might be interested in scraping information about each article of a blog, and store it information in a database. To achieve such a thing, we will see how to implement a simple spider using Scrapy, which will crawl the blog and store the extracted data into a MongoDB database. We will consider that you have a working MongoDB server, and that you have installed the pymongo and scrapy python packages, both installable with pip.

If you have never toyed around with Scrapy, you should first read this short tutorial. In this example, we’ll see how to extract the following information from each isbullsh.it blogpost : title author tag release date url. Web & Internet Archives ← Python For Beginners. MM. I am currently working on a Flask feed reader app.

MM

The plan is to enable a reader app that attaches to static site generator like Pelican , Nikola and Mynt among many others . I have the basic structure of the app done and I wanted to implement feed parsing. With quite a bit of effort I probably could have rolled my own bugging feedparser, but I am a believer in not re-inventing the wheel. Extract all links from a web page. Problem You want to extract all the links from a web page.

Extract all links from a web page

You need the links in absolute path format since you want to further process the extracted links. Solution Unix commands have a very nice philosophy: “do one thing and do it well”. Keeping that in mind, here is my link extractor: Designing a RESTful API with Python and Flask. In recent years REST (REpresentational State Transfer) has emerged as the standard architectural design for web services and web APIs.

Designing a RESTful API with Python and Flask

In this article I'm going to show you how easy it is to create a RESTful web service using Python and the Flask microframework. What is REST? The characteristics of a REST system are defined by six design rules: Client-Server: There should be a separation between the server that offers a service, and the client that consumes it.Stateless: Each request from a client must contain all the information required by the server to carry out the request. What is a RESTful web service? The REST architecture was originally designed to fit the HTTP protocol that the world wide web uses. Central to the concept of RESTful web services is the notion of resources. The HTTP request methods are typically designed to affect a given resource in standard ways: The REST design does not require a specific format for the data provided with the requests.

Mechanize 0.2.5. Stateful programmatic web browsing.

mechanize 0.2.5

Stateful programmatic web browsing, after Andy Lester's Perl module WWW::Mechanize. mechanize.Browser implements the urllib2.OpenerDirector interface. Browser objects have state, including navigation history, HTML form state, cookies, etc. The set of features and URL schemes handled by Browser objects is configurable. The library also provides an API that is mostly compatible with urllib2: your urllib2 program will likely still work if you replace "urllib2" with "mechanize" everywhere.

Features include: ftp:, http: and file: URL schemes, browser history, hyperlink and HTML form support, HTTP cookies, HTTP-EQUIV and Refresh, Referer [sic] header, robots.txt, redirections, proxies, and Basic and Digest HTTP authentication. Requests: HTTP for Humans — Requests 1.2.0 documentation. I'm writing a book on Flask - Robert Picard. I’ve been looking for a project to do this summer.

I'm writing a book on Flask - Robert Picard

I’m going to have a little over two months off of school, and I don’t have a lot of time-consuming responsibilities at work this summer. Yesterday, I found my project. I’m going to write a book on web development with Flask. I’m not an expert on web development, or Flask. I am, however, fairly knowledgable about both. Now, I write. I’ll be posting to this blog with updates on my progress as well as samples from the book as I write it. Also, if you have suggestions for subjects you’d like to see in the book, send me an email at mail@ this domain, or leave a comment below.

0.17 documentation — Scrapy 0.17.0 documentation. Beautiful Soup: We called him Tortoise because he taught us. [ Download | Documentation | Hall of Fame | For enterprise | Source | Changelog | Discussion group | Zine ] You didn't write that awful page. You're just trying to get some data out of it. Beautiful Soup is here to help.

Since 2004, it's been saving programmers hours or days of work on quick-turnaround screen scraping projects. Beautiful Soup is a Python library designed for quick turnaround projects like screen-scraping. Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need.

Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. Valuable data that was once locked up in poorly-designed websites is now within your reach. Interested? Getting and giving support. Getting Dell warranty expirations with python, my first time playing with web scraping. As I mentioned in my blog on my first learning resources, right after finishing the Google training, the first script I wrote was a script to get a Dell hardware warranty expiration given the service tag.

Getting Dell warranty expirations with python, my first time playing with web scraping

It was my first attempt at web scraping, something you learn with the Google python videos. I am not sure if this is the best or most efficient way to do this, but it was a way for me to test out what I just learned. If you have some advice, don’t hesitate to share it in the comments.