background preloader

PDFtk - The PDF Toolkit

PDFtk - The PDF Toolkit
PDFtk - The PDF Toolkit PDFtk is a simple tool for doing everyday things with PDF documents. It comes in three flavors: PDFtk Free, PDFtk Pro, and our original command-line tool PDFtk Server. PDFtk PDFtk Free is our friendly graphical tool for quickly merging and splitting PDF documents and pages. Power Users: PDFtk Free comes with our command-line tool, PDFtk Server. Now available for Windows XP, Vista, Windows 7 and Windows 8. Use PDFtk Pro to quickly split, merge, rotate, watermark, stamp and secure PDF pages and documents. Power Users: PDFtk Pro comes with our command-line tool, PDFtk Server. Only $3.99! Now available for Windows XP, Vista, Windows 7 and Windows 8. PDFtk Server is our original command-line tool. Learn More About PDFtk Server About PDF Labs Our mission is to make PDF easier to use. PDF Labs is operated by Sid Steward, author of PDF Hacks (O’Reilly) and the popular PDF Toolkit. Please contact Sid Steward by email. sid.steward@pdflabs.com About PDF Hacks About PDFtk Related:  manipulating PDF

The Stanford NLP (Natural Language Processing) Group About | Getting started | Questions | Mailing lists | Download | Extensions | Models | Online demo | Release history | FAQ About Stanford NER is a Java implementation of a Named Entity Recognizer. Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names. It comes with well-engineered feature extractors for Named Entity Recognition, and many options for defining feature extractors. Included with the download are good named entity recognizers for English, particularly for the 3 classes (PERSON, ORGANIZATION, LOCATION), and we also make available on this page various other models for different languages and circumstances, including models trained on just the CoNLL 2003 English training data. Stanford NER is also known as CRFClassifier. The CRF code is by Jenny Finkel. Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Getting started This NER system requires Java 1.8 or later.

Piccolo Home Page A Structured 2D Graphics Framework Welcome to Piccolo! A revolutionary way to create robust, full-featured graphical applications in Java and C#, with striking visual effects such as zooming, animation and multiple representations. Piccolo is a toolkit that supports the development of 2D structured graphics programs, in general, and Zoomable User Interfaces (ZUIs), in particular. A ZUI is a new kind of interface that presents a huge canvas of information on a traditional computer display by letting the user smoothly zoom in, to get more detailed information, and zoom out for an overview. Why use Piccolo? What exactly is it? Ghostscript: Ghostscript A lightweight XMP parser for extracting PDF metadata in Python | Matt's Blog Metadata (title, author, etc.) can be embedded in PDF files in a number of different ways, and can be a bit of a pain to extract. Older PDFs use “Info” in the XRefs trailer, whereas newer ones use XMP metadata. Using the Python PDFMiner library, it’s possible to extract the “Info” as a python dictionary, but the XMP metadata is just extracted as raw XML. I couldn’t find a nice lightweight XMP parser in Python, so I put together something that seemed to work on all the PDFs I threw at it. You can install PDFMiner by downloading the source, then doing: cd pdfminer make cmap python setup.py install Once installed, use PDFMiner to open the PDF and get the XMP. The xmp_to_dict function is defined follows:

Python config file parser I needed a configuration file and did not want to write yet-another-parser for it and document the format. Instead of putting up with ConfigParser I decided to use the syntax and parser already available in Python, but restricted to a "safe" subset of the Python syntax: Assignments, bool, dict list, string, float, bool, and, or, xor, arithmetics, string expressions and if..then..else. The "unsafe" Python statements are deleted from the code by editing the Python parse tree. The configuration is passed to the parser as a string either from a file or a string constant containing the default values. Note: The source is actually in two parts: The configuration parser and the test script for it. PS: The difference between parser.suite and parser.expr is not quite clear to me.

libjpeg pdfssa4met - PDF Structure and Syntactic Analysis for Metadata Extraction and Tagging PDFSSA4MET attempts to provide metadata extraction and tagging based on structural and syntactic analysis of content in XML. Capabilities Given PDFs that conform to a fairly conventional structure (e.g. scholarly works), attempts to extract and tag: headings title author chapter / section headings references title volume page numbers cited publications and URLs suggested social tags Headings are identified by looking for text that deviates from the norm in terms of size, colour, or weight (bold). References are identified by looking for patterns in the "References" section. Titles, Authors, Headings, References and the component elements are tagged with quick and dirty XML tags. Scripts will have varying success between different PDFs, but will hopefully become more consistent and reliable with additional testing. Dependencies PDF to XML conversion by binary available from sourceforge Python 2.6+ lxml rdflib Download Source code available as gzipped TAR archive and via Subversion. Usage

boolopt is a Python module for optimizing propositional logic. boolopt is a Python module for optimizing propositional logic. It was written during my internship in Cape Town at NBN, South Africa’s National Bioinformatics Network; they used it for query optimization. The module provides a function optimize, which takes a disjunctive-normal formula (without intraclause contradictions and duplicates) and outputs an equivalent, minimal term. There is an existing implementation, but I couldn’t get it to work — that’s why I wrote my own. It’s worked instantaneously on all of the (small) examples I’ve thrown at it, but the method is NP-complete and involves building a truth-table — it’s EXPSPACE in the number of propositions. I release it here under the NewBSD license: you can do whatever you want with the code so long as you leave the license and copyright information at the top. New versions will be announced on the main site feed. Download Installation Open up the archive wherever; it will create its own directory boolopt-1.0. Examples Fun facts

Home Page libpng is the official PNG reference library. It supports almost all PNG features, is extensible, and has been extensively tested for over 20 years. The home site for development versions (i.e., may be buggy or subject to change or include experimental features) is and the place to go for questions about the library is the png-mng-implement mailing list. libpng is available as ANSI C (C89) source code and requires zlib 1.0.4 or later (1.2.5 or later recommended for performance and security reasons). The portability notice should not come as a particular surprise to anyone who has added libpng support to an application this millenium; the manual has warned of it since at least July 2000. The 1.5.x and later series also include a new, more thorough test program (pngvalid.c) and a new pnglibconf.h header file that tracks what features were enabled or disabled when libpng was built. Security and Crash Bugs in Older Versions

Python to get Media Metadata At Babo Labs, we're interested in eliminating work for our digital merchants by providing them enabling technologies. An enabling technology is one that assists a user in completing a task more productively and efficiently, while minimizing intrusiveness or inconvenience. One example of an enabling technology is Google's instant search bar which shows search engine results as you type your query, in real time (statistics show this service saves 2-5 seconds per query on average). One way our social e-commerce platform, Babolog, accomplish this is by passively-dynamically collecting meta information about the digital media files our merchants upload, and then displaying these meaningful specifications to their customers. Over the past month, Stephen and I have tested a variety of Python modules for extracting metadata from media files. Documentation Installation: Ubuntu apt installsudo apt-get install python-kaa-metadata #example: import kaa.metadata def getKaaMetadata(filepath): print meta 3.

rulecore Package Index > ruleCore > 1.1beta3 Not Logged In Status Nothing to report ruleCore 1.1beta3 Complex Event Pattern Detector The ruleCore Engine is an event-driven rule engine that manages and executes reaction rules. The ruleCore Engine provides capabilities for detection of complex patterns of events, called situations. The ruleCore Engine is fed with events through connectors. Downloads (All Versions): 0 downloads in the last day 0 downloads in the last week 0 downloads in the last month Website maintained by the Python community Real-time CDN by Fastly / hosting by Rackspace / design by Tim Parkin

Related: