background preloader

Beautiful Soup Documentation — Beautiful Soup 4.2.0 documentation

Beautiful Soup Documentation — Beautiful Soup 4.2.0 documentation
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work. These instructions illustrate all major features of Beautiful Soup 4, with examples. I show you what the library is good for, how it works, how to use it, how to make it do what you want, and what to do when it violates your expectations. The examples in this documentation should work the same way in Python 2.7 and Python 3.2. You might be looking for the documentation for Beautiful Soup 3. This documentation has been translated into other languages by Beautiful Soup users: 这篇文档当然还有中文版.このページは日本語で利用できます(外部リンク)이 문서는 한국어 번역도 가능합니다. Here’s an HTML document I’ll be using as an example throughout this document. Here are some simple ways to navigate that data structure: One common task is extracting all the URLs found within a page’s <a> tags: Tag Name Related:  Pythonlogank1

Pandas Pivot Table Explained - Practical Business Python Introduction Most people likely have experience with pivot tables in Excel. Pandas provides a similar function called (appropriately enough) pivot_table . While it is exceedingly useful, I frequently find myself struggling to remember how to use the syntax to format the output for my needs. If you are not familiar with the concept, wikipedia explains it in high level terms. As an added bonus, I’ve created a simple cheat sheet that summarizes the pivot_table. The Data One of the challenges with using the panda’s pivot_table is making sure you understand your data and what questions you are trying to answer with the pivot table. In this scenario, I’m going to be tracking a sales pipeline (also called funnel). Typical questions include: How much revenue is in the pipeline? Many companies will have CRM tools or other software that sales uses to track the process. Using a panda’s pivot table can be a good alternative because it is: Read in the data Let’s set up our environment first. Columns vs.

Time Series analysis tsa — statsmodels 0.7.0 documentation statsmodels.tsa contains model classes and functions that are useful for time series analysis. This currently includes univariate autoregressive models (AR), vector autoregressive models (VAR) and univariate autoregressive moving average models (ARMA). It also includes descriptive statistics for time series, for example autocorrelation, partial autocorrelation function and periodogram, as well as the corresponding theoretical properties of ARMA or related processes. It also includes methods to work with autoregressive and moving average lag-polynomials. Additionally, related statistical tests and some useful helper functions are available. Estimation is either done by exact or conditional Maximum Likelihood or conditional least-squares, either using Kalman Filter or direct filters. Currently, functions and classes have to be imported from the corresponding module, but the main classes will be made available in the statsmodels.tsa namespace. Descriptive Statistics and Tests Estimation

Beautiful Soup: We called him Tortoise because he taught us. [ Download | Documentation | Hall of Fame | For enterprise | Source | Changelog | Discussion group | Zine ] You didn't write that awful page. You're just trying to get some data out of it. Beautiful Soup is here to help. Since 2004, it's been saving programmers hours or days of work on quick-turnaround screen scraping projects. Beautiful Soup is a Python library designed for quick turnaround projects like screen-scraping. Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need. Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. Valuable data that was once locked up in poorly-designed websites is now within your reach. Interested? Getting and giving support If you have questions, send them to the discussion group. If you use Beautiful Soup as part of your work, please consider a Tidelift subscription. Download Beautiful Soup

Python Lists - Google for Education Python has a great built-in list type named "list". List literals are written within square brackets [ ]. Lists work similarly to strings -- use the len() function and square brackets [ ] to access data, with the first element at index 0. Assignment with an = on lists does not make a copy. The "empty list" is just an empty pair of brackets [ ]. Python's *for* and *in* constructs are extremely useful, and the first use of them we'll see is with lists. If you know what sort of thing is in the list, use a variable name in the loop that captures that information such as "num", or "name", or "url". The *in* construct on its own is an easy way to test if an element appears in a list (or other collection) -- value in collection -- tests if the value is in the collection, returning True/False. The for/in constructs are very commonly used in Python code and work on data types other than list, so you should just memorize their syntax. You can also use for/in to work on a string.

Pattern Pattern is a web mining module for the Python programming language. It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and <canvas> visualization. The module is free, well-document and bundled with 50+ examples and 350+ unit tests. Download Installation Pattern is written for Python 2.5+ (no support for Python 3 yet). To install Pattern so that the module is available in all Python scripts, from the command line do: > cd pattern-2.6 > python setup.py install If you have pip, you can automatically download and install from the PyPi repository: If none of the above works, you can make Python aware of the module in three ways: Quick overview pattern.web pattern.en The pattern.en module is a natural language processing (NLP) toolkit for English. pattern.search pattern.vector Case studies

Requests: HTTP for Humans — Requests 2.7.0 documentation Python Boot Camp The BootCamp has wrapped up ... if you took the bootcamp please give us feedback. We took over all the Brower Center! This is the main site for the Python Boot Camp Fall 2013, at the Brower Center (2150 Allston Way) near the UC Berkeley Campus from August 26 (Monday) to August 28 (Wednesday) 2013. If you are interested in a detailed introduction to various Pythonic topics to aid in research, consider the "Python Computing for Data Scientists" seminar course (Thursday 1-4pm; Fall 2013, CC #06080). You must have registered to participate in the Camp and doing so early means we can stay in touch with you with any new developments. I Don’t Need No Stinking API: Web Scraping For Fun and Profit If you’ve ever needed to pull data from a third party website, chances are you started by checking to see if they had an official API. But did you know that there’s a source of structured data that virtually every website on the internet supports automatically, by default? That’s right, we’re talking about pulling our data straight out of HTML — otherwise known as web scraping. Any content that can be viewed on a webpage can be scraped. If a website provides a way for a visitor’s browser to download content and render that content in a structured way, then almost by definition, that content can be accessed programmatically. Over the past few years, I’ve scraped dozens of websites — from music blogs and fashion retailers to the USPTO and undocumented JSON endpoints I found by inspecting network traffic in my browser. There are some tricks that site owners will use to thwart this type of access — which we’ll dive into later — but they almost all have simple work-arounds. Want to Learn More?

's Python Class  |  Python Education  |  Google Developers Welcome to Google's Python Class -- this is a free class for people with a little bit of programming experience who want to learn Python. The class includes written materials, lecture videos, and lots of code exercises to practice Python coding. These materials are used within Google to introduce Python to people who have just a little programming experience. The first exercises work on basic Python concepts like strings and lists, building up to the later exercises which are full programs dealing with text files, processes, and http connections. To get started, the Python sections are linked at the left -- Python Set Up to get Python installed on your machine, Python Introduction for an introduction to the language, and then Python Strings starts the coding material, leading to the first exercise. This material was created by Nick Parlante working in the engEDU group at Google. Tip: Check out the Python Google Code University Forum to ask and answer questions.

Computer Networking : Principles, Protocols and Practice | INL: IP Networking Lab Computer Networking : Principles, Protocols and Practice (aka CNP3) is an ongoing effort to develop an open-source networking textbook that could be used for an in-depth undergraduate or graduate networking courses. The first edition of the textbook used the top-down approach initially proposed by Jim Kurose and Keith Ross for their Computer Networks textbook published by Addison Wesley. CNP3 is distributed under a creative commons license. The second edition of the ebook is now divided in two main parts The first part of the ebook uses a bottom-up approach and focuses on the principles of the computer networks without entering into protocol and practical details. Numerous exercises are also provided as well as interactive quizzes that enable the students to verify their understanding of the different chapters and lab experiments with netkit and other software tools. First edition of the textbook The book contains the following chapters : Bibliography Slides Discussion groups Contributors

Ultimate Guide to Web… by Hartley Brody Hopefully you learned a thing or two from my article I Don’t Need No Stinking API: Web Scraping For Fun and Profit. Due to the popularity of that article — almost 100,000 views — I decided to write an even more detailed survey of the field, full of all the web scraping tips and tricks I've picked up. The goal of the book — The Ultimate Guide to Web Scraping — is to hone your skills and help you become master craftsman in the art of web scraping. We'll talk about the reasons why web scraping is a valid way to harvest information — despite common complaints.

Related: