background preloader

Natural Language Processing

Facebook Twitter

Taking a look at Java-based Machine Learning by Classification. In this post I want to share some experiences in the field of “Machine Learning” my current project pointed me to lately.

Taking a look at Java-based Machine Learning by Classification

I will focus myself on “Data Classification” with the tool RapidMiner and give an overview of the topic. Especially I would like to share how you can use this “stuff” from your Java application. If you have a background in architecting and developing enterprise software like I have, chances are high, that you spend most of your time thinking about the structure of your software system: How can I arrange the code for the different features of my system so that all the different architectural *abilities (Scalability, Maintainability, …) are met? To be honest with us, most often the features themselves are relatively simple: get some data from the GUI, validate the data by mostly simple rules, store the data in a database and retrieve it later to present it on yet another GUI.

Lately I was pointed to some different kind of beast. Document classification with Kofax Transformation Modules (KTM) Many of our customers are using systems for automatic document classification and data extraction.

Document classification with Kofax Transformation Modules (KTM)

‘Kofax Transformation Modules’ (KTM) is one of these systems. These data capturing systems extract metadata out of the electronic images (these are the scanned pages of the documents, faxes or emails) and release the data and the document to business applications. Apache OpenNLP - Welcome to Apache OpenNLP.