background preloader

Lib Graphiques

Facebook Twitter

Manipulating PDFs with Python. PDF documents are beautiful things, but that beauty is often only skin deep. Inside, they might have any number of structures that are difficult to understand and exasperating to get at. The PDF reference specification (ISO 32000-1) provides rules, but it's programmers who follow them, and they, like all programmers, are a creative bunch. That means that in the end, a beautiful PDF document is really meant to be read and its internals are not to be messed with. Well, we are programmers too, and we are a creative bunch, so we'll see how we can get at those internals.

Still, the best advice if you have to extract or add information to a PDF is: don't do it. If you cannot get access to the information further upstream, this tutorial will show you some of the ways you can get inside the PDF using Python. There are several Python packages that can help. Pdfrw : Last update: 2012. Slate : Active development. PDFQuery : Active development. PDFMiner : Active development. Related Tools #! Delete #! Programming Guide.

Images

Tkinter. Scikit-image: Image processing in Python — scikit-image. OpenCV | OpenCV. Python-imaging/Pillow.