ECML/PKDD'02 Tutorial on Text Mining and Internet Content filtering
José María Gómez Hidalgo Departamento de Inteligencia Artificial Universidad Europea de Madrid In the recent years, we have witnessed an impressive growth of the availability of information in electronic format, mostly in the form of text, due to the Internet and the increasing number and size of digital and corporate libraries. The overwhelming amount of text is hardly to consume for an average human being, who faces an information overload problem.
American English Pronunciation Lesson: Wh- question Pitch Boundaries
Introduction to wh-questions A wh-question begins with the words who, what, why, when, where, and how. These types of questions seek information and cannot be answered with "yes" or "no." Wh-questions can end with a rising or falling pitch boundary, depending on whether the speaker is truly asking a question, or is masking a suggestion as a question. Rising pitch boundary in wh-question When the speaker holds no assumption as to what the answer will be, and the topic is new, the wh-question is likely to have a rising pitch boundary.
Wolfram
Natural language processing
Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human–computer interaction. Many challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input, and others involve natural language generation.
Dictionary of American Regional English
"Arthur the Rat" "Arthur the Rat" is a short tale devised to obtain phonetic representation from throughout the country of all phonemes in American English. See full text of the story here.
a free, commonsense-enriched natural language understander
Recent bugfixes Version 2.1 (6 Aug 2004) - includes new MontyNLGenerator component generates sentences and summaries Version 2.0.1 - fixes API bug in version 2.0 which prevents java api from being callable What is MontyLingua? [top] MontyLingua is a free*, commonsense-enriched, end-to-end natural language understander for English.
ConceptNet
What is ConceptNet? [top] ConceptNet is a freely available commonsense knowledgebase and natural-language-processing toolkit which supports many practical textual-reasoning tasks over real-world documents right out-of-the-box (without additional statistical training) including topic-jisting (e.g. a news article containing the concepts, “gun,” “convenience store,” “demand money” and “make getaway” might suggest the topics “robbery” and “crime”), affect-sensing (e.g. this email is sad and angry), analogy-making (e.g. “scissors,” “razor,” “nail clipper,” and “sword” are perhaps like a “knife” because they are all “sharp,” and can be used to “cut something”), text summarization contextual expansion causal projection cold document classification and other context-oriented inferences The ConceptNet knowledgebase is a semantic network presently available in two versions: concise (200,000 assertions) and full (1.6 million assertions).
The `Bow' Toolkit
Bow (or libbow) is a library of C code useful for writing statistical text analysis, language modeling and information retrieval programs. The current distribution includes the library, as well as front-ends for document classification (rainbow), document retrieval (arrow) and document clustering (crossbow). The library and its front-ends were designed and written by Andrew McCallum, with some contributions from several graduate and undergraduate students. The name of the library rhymes with `low', not `cow'.
Maximum Entropy Modeling Using SharpEntropy. Free source code and programming articles
Overview This article presents a maximum entropy modeling library called SharpEntropy, and discusses its usage, first by way of a simple example of predicting outcomes, and secondly, by presenting a way of splitting English sentences into constituent tokens (useful for natural language processing). Please note that because most of the code is a conversion based on original Java libraries published under the LGPL license, the source code available for download with this article is also released under the LGPL license. This means, it can freely be used in software that is released under any sort of license, but if you make changes to the library itself and those changes are not for your private use, you must release the source code to those changes.
Brill POS Tagger for Win32 Paul Maddox
LDC - Linguistic Data Consortium - Current Projects