PDFMiner
Last Modified: Mon Mar 24 12:02:47 UTC 2014 Python PDF parser and analyzer Homepage Recent Changes PDFMiner API What's It? PDFMiner is a tool for extracting information from PDF documents. Features Written entirely in Python. PDFMiner is about 20 times slower than other C/C++-based counterparts such as XPdf. Online Demo: (pdf -> html conversion webapp) Download Source distribution: github: Where to Ask Questions and comments: How to Install Install Python 2.4 or newer. For CJK languages In order to process CJK languages, you need an additional step to take during installation: # make cmap python tools/conv_cmap.py pdfminer/cmap Adobe-CNS1 cmaprsrc/cid2code_Adobe_CNS1.txt reading 'cmaprsrc/cid2code_Adobe_CNS1.txt'... writing 'CNS1_H.py'... ... On Windows machines which don't have make command, paste the following commands on a command line prompt: -n
The pyPdf.pdf Module
DocumentInformation() (class) [#] A class representing the basic document metadata provided in a PDF File. As of pyPdf v1.10, all text properties of the document metadata have two properties, eg. author and author_raw. The non-raw property will always return a TextStringObject, making it ideal for a case where the metadata is being displayed. author [#] Read-only property accessing the document's author. Returns: A unicode string, or None if the author is not provided. creator [#] Read-only property accessing the document's creator. A unicode string, or None if the creator is not provided. producer [#] Read-only property accessing the document's producer. A unicode string, or None if the producer is not provided. subject [#] Read-only property accessing the subject of the document. A unicode string, or None if the subject is not provided. title [#] Read-only property accessing the document's title. A unicode string, or None if the title is not provided. PageObject(pdf) (class) [#] artBox [#] page2
HOWTO Use Python in the web — Python v2.7.5 documentation
Programming for the Web has become a hot topic since the rise of “Web 2.0”, which focuses on user-generated content on web sites. It has always been possible to use Python for creating web sites, but it was a rather tedious task. Therefore, many frameworks and helper tools have been created to assist developers in creating faster and more robust sites. This HOWTO describes some of the methods used to combine Python with a web server to create dynamic content. The Low-Level View When a user enters a web site, their browser makes a connection to the site’s web server (this is called the request). Dynamic web sites are not based on files in the file system, but rather on programs which are run by the web server when a request comes in, and which generate the content that is returned to the user. Most HTTP servers are written in C or C++, so they cannot execute Python code directly – a bridge is needed between the server and the program. Not every web server supports every interface. #! #!
2. Built-in Functions — Python v2.7.5 documentation
Open a file, returning an object of the file type described in section File Objects. If the file cannot be opened, IOError is raised. When opening a file, it’s preferable to use open() instead of invoking the file constructor directly. The first two arguments are the same as for stdio‘s fopen(): name is the file name to be opened, and mode is a string indicating how the file is to be opened. The most commonly-used values of mode are 'r' for reading, 'w' for writing (truncating the file if it already exists), and 'a' for appending (which on some Unix systems means that all writes append to the end of the file regardless of the current seek position). The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). Modes 'r+', 'w+' and 'a+' open the file for updating (reading and writing); note that 'w+' truncates the file.
5. Data Structures — Python v3.3.0 documentation
This chapter describes some things you’ve learned about already in more detail, and adds some new things as well. 5.1. More on Lists The list data type has some more methods. list.append(x) Add an item to the end of the list. list.extend(L) Extend the list by appending all the items in the given list. list.insert(i, x) Insert an item at a given position. list.remove(x) Remove the first item from the list whose value is x. list.pop([i]) Remove the item at the given position in the list, and return it. list.clear() Remove all items from the list. list.index(x) Return the index in the list of the first item whose value is x. list.count(x) Return the number of times x appears in the list. list.sort() Sort the items of the list in place. list.reverse() Reverse the elements of the list in place. list.copy() Return a shallow copy of the list. An example that uses most of the list methods: 5.1.1. 5.1.2. 5.1.3. List comprehensions provide a concise way to create lists. We can obtain the same result with: 5.1.4.
Related:
Related: