Data Mining

TwitterFacebook
Get flash to fully experience Pearltrees
Carrot2 Search Results Clustering Engine http://search.carrot2.org/stable/search

Clustering Engine

teamware/teamware-detail

Semantic Annotation (SA) is about attaching meaningful structures to resources like documents or video streams in such a way that they can be used by computers to enhance the usefulness of those resources. SA is not new: when a BBC archivist, for example, attaches thesaurus categories to programme segments for indexing they have performed semantic annotation. http://gate.ac.uk/teamware/teamware-detail.html

teamware/index

http://gate.ac.uk/teamware/ Teamware is a web-based management platform for collaborative annotation & curation. It is a cost-effective environment for annotation and curation projects, enabling you to harness a broadly distributed workforce and monitor progress & results remotely in real time.
On this page you can find the latest stable release of GATE Developer and Embedded, as well as the latest nightly built snapshots . For other GATE products please go to GATECloud.net or follow the links to the source code from our Sourceforge pages . NOTE: if you are upgrading from one version of GATE to another you must delete your user configuration file before running the new version.

download/index

http://gate.ac.uk/download/
http://gate.ac.uk/family/process.html

family/process

The GATE Process describes the steps you need to take if you want to create predictable and sustainable language processing capabilities in your organisation. The process is supported by software (most notably GATE Teamware ), but it is not primarily based on tools.

UIMA

http://en.wikipedia.org/wiki/UIMA UIMA stands for Unstructured Information Management Architecture . An OASIS standard [ 2 ] as of March 2009, UIMA is to date the only industry standard for content analytics [ citation needed ] .
http://www.ukp.tu-darmstadt.de/projects Semantic Information Management Semantic Information Retrieval (SIR-3) This project systematically investigates the semantic and lexical relationships between words and concepts and its usefulness in information retrieval (IR) tasks.

Current Projects

Data mining

http://en.wikipedia.org/wiki/Data_mining Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), [ 1 ] an interdisciplinary subfield of computer science , [ 2 ] [ 3 ] [ 4 ] is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence , machine learning , statistics , and database systems . [ 2 ] The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. [ 2 ] Aside from the raw analysis step, it involves database and data management aspects, data preprocessing , model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization , and online updating . [ 2 ]
The following outline is provided as an overview of and topical guide to natural language processing: [ edit ] What is natural language processing? Natural language processing (NLP) – computerized processes intended to result in natural language understanding and natural language generation . http://en.wikipedia.org/wiki/Outline_of_natural_language_processing#Natural_language_processing_toolkits

List of natural language processing toolkits

LanguageWare is a natural language processing (NLP) technology developed by IBM , which allows applications to process natural language text. http://en.wikipedia.org/wiki/Languageware

Languageware

General Architecture for Text Engineering

General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for all sorts of natural language processing tasks, including information extraction in many languages. GATE has been compared to NLTK , R and RapidMiner . [ 1 ] As well as being widely used in its own right, it forms the basis of the KIM semantic platform. [ 2 ]

Carrot2

Carrot² [ 1 ] is an open source search results clustering engine. [ 2 ] It can automatically cluster small collections of documents, e.g. search results or document abstracts, into thematic categories.