The Stanford NLP (Natural Language Processing) Group About | Getting started | Questions | Mailing lists | Download | Extensions | Models | Online demo | Release history | FAQ About Stanford NER is a Java implementation of a Named Entity Recognizer. Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names. It comes with well-engineered feature extractors for Named Entity Recognition, and many options for defining feature extractors. Included with the download are good named entity recognizers for English, particularly for the 3 classes (PERSON, ORGANIZATION, LOCATION), and we also make available on this page various other models for different languages and circumstances, including models trained on just the CoNLL 2003 English training data. Stanford NER is also known as CRFClassifier. The CRF code is by Jenny Finkel. Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Getting started This NER system requires Java 1.8 or later.
PDFJoin! - Join PDF files online for free. Piccolo Home Page A Structured 2D Graphics Framework Welcome to Piccolo! A revolutionary way to create robust, full-featured graphical applications in Java and C#, with striking visual effects such as zooming, animation and multiple representations. Piccolo is a toolkit that supports the development of 2D structured graphics programs, in general, and Zoomable User Interfaces (ZUIs), in particular. A ZUI is a new kind of interface that presents a huge canvas of information on a traditional computer display by letting the user smoothly zoom in, to get more detailed information, and zoom out for an overview. Why use Piccolo? What exactly is it?
A lightweight XMP parser for extracting PDF metadata in Python | Matt's Blog Metadata (title, author, etc.) can be embedded in PDF files in a number of different ways, and can be a bit of a pain to extract. Older PDFs use “Info” in the XRefs trailer, whereas newer ones use XMP metadata. Using the Python PDFMiner library, it’s possible to extract the “Info” as a python dictionary, but the XMP metadata is just extracted as raw XML. I couldn’t find a nice lightweight XMP parser in Python, so I put together something that seemed to work on all the PDFs I threw at it. You can install PDFMiner by downloading the source, then doing: cd pdfminer make cmap python setup.py install Once installed, use PDFMiner to open the PDF and get the XMP. The xmp_to_dict function is defined follows:
Comment chiffrer et certifier un PDF Chiffrez et certifiez vos PDF Si vous utilisez des petits softs gratuits comme CutePDF ou des sites pour générer vos fichiers PDF, peut être avez-vous besoin de rajouter par dessus une couche de sécurité en les chiffrant et/ou les certifiant ? Pour cela il existe un petit logiciel baptisé iSafePDF qui est capable de faire tout cela en quelques clics (pour Windows uniquement). Et bien sûr, toutes ces modifs standards sont reconnues par les lecteurs PDF genre Adobe Reader. Modifier les informations du PDF Chiffrer le PDF Certifier le PDF Pour télécharger iSafePDF, c'est par ici que ça se passe... [Photo et Source] Vous avez aimé cet article ?
Python config file parser I needed a configuration file and did not want to write yet-another-parser for it and document the format. Instead of putting up with ConfigParser I decided to use the syntax and parser already available in Python, but restricted to a "safe" subset of the Python syntax: Assignments, bool, dict list, string, float, bool, and, or, xor, arithmetics, string expressions and if..then..else. The "unsafe" Python statements are deleted from the code by editing the Python parse tree. The configuration is passed to the parser as a string either from a file or a string constant containing the default values. Note: The source is actually in two parts: The configuration parser and the test script for it. PS: The difference between parser.suite and parser.expr is not quite clear to me.
pdfssa4met - PDF Structure and Syntactic Analysis for Metadata Extraction and Tagging PDFSSA4MET attempts to provide metadata extraction and tagging based on structural and syntactic analysis of content in XML. Capabilities Given PDFs that conform to a fairly conventional structure (e.g. scholarly works), attempts to extract and tag: headings title author chapter / section headings references title volume page numbers cited publications and URLs suggested social tags Headings are identified by looking for text that deviates from the norm in terms of size, colour, or weight (bold). References are identified by looking for patterns in the "References" section. Titles, Authors, Headings, References and the component elements are tagged with quick and dirty XML tags. Scripts will have varying success between different PDFs, but will hopefully become more consistent and reliable with additional testing. Dependencies PDF to XML conversion by binary available from sourceforge Python 2.6+ lxml rdflib Download Source code available as gzipped TAR archive and via Subversion. Usage
boolopt is a Python module for optimizing propositional logic. boolopt is a Python module for optimizing propositional logic. It was written during my internship in Cape Town at NBN, South Africa’s National Bioinformatics Network; they used it for query optimization. The module provides a function optimize, which takes a disjunctive-normal formula (without intraclause contradictions and duplicates) and outputs an equivalent, minimal term. There is an existing implementation, but I couldn’t get it to work — that’s why I wrote my own. It’s worked instantaneously on all of the (small) examples I’ve thrown at it, but the method is NP-complete and involves building a truth-table — it’s EXPSPACE in the number of propositions. I release it here under the NewBSD license: you can do whatever you want with the code so long as you leave the license and copyright information at the top. New versions will be announced on the main site feed. Download Installation Open up the archive wherever; it will create its own directory boolopt-1.0. Examples Fun facts
Python to get Media Metadata At Babo Labs, we're interested in eliminating work for our digital merchants by providing them enabling technologies. An enabling technology is one that assists a user in completing a task more productively and efficiently, while minimizing intrusiveness or inconvenience. One example of an enabling technology is Google's instant search bar which shows search engine results as you type your query, in real time (statistics show this service saves 2-5 seconds per query on average). One way our social e-commerce platform, Babolog, accomplish this is by passively-dynamically collecting meta information about the digital media files our merchants upload, and then displaying these meaningful specifications to their customers. Over the past month, Stephen and I have tested a variety of Python modules for extracting metadata from media files. Documentation Installation: Ubuntu apt installsudo apt-get install python-kaa-metadata #example: import kaa.metadata def getKaaMetadata(filepath): print meta 3.
rulecore Package Index > ruleCore > 1.1beta3 Not Logged In Status Nothing to report ruleCore 1.1beta3 Complex Event Pattern Detector The ruleCore Engine is an event-driven rule engine that manages and executes reaction rules. The ruleCore Engine provides capabilities for detection of complex patterns of events, called situations. The ruleCore Engine is fed with events through connectors. Downloads (All Versions): 0 downloads in the last day 0 downloads in the last week 0 downloads in the last month Website maintained by the Python community Real-time CDN by Fastly / hosting by Rackspace / design by Tim Parkin
DLE using Python | Datalogics Blog I’ve always had an appreciation for the higher level languages, the ones that make life easier, that let you code rather than worry about the housekeeping. C# is an improvement over coding in C or C++, since it relieves you of many of the burdens of tracking pointers and object ownership. You still have to compile the program before you can run it. Scripting languages like Python give the best of both worlds. Programs don’t require compilation before being run, and in fact, you can type commands to an interactive console, just like in the old days of BASIC. I’ve been something of a Pythonista for a long time now, and I’ve always wanted to access the PDF Library from Python. Before you go digging in the distribution to find the secret Python bindings, I’ll tell you there aren’t any. Both mix the ease of use of Python with direct access to the features of the underlying VM. For this article, I’m going to focus on Jython. Getting started with Jython On Windows it’s sufficient to