Natural language

TwitterFacebook
Get flash to fully experience Pearltrees

Google Books Ngram Datasets

http://commondatastorage.googleapis.com/books/syntactic-ngrams/index.html Syntactic N-grams
http://noduslabs.com/research/pathways-meaning-circulation-text-network-analysis/

Identifying the Pathways for Meaning Circulation using Text Network Analysis

By Dmitry Paranyushkin, Nodus Labs. Published October 2011, Berlin. Abstract: In this work we propose a method and algorithm for identifying the pathways for meaning circulation within a text.
The advent of humanoid robots has enabled a new approach to investigating the acquisition of language, and we report on the development of robots able to acquire rudimentary linguistic skills. Our work focuses on early stages analogous to some characteristics of a human child of about 6 to 14 months, the transition from babbling to first word forms. We investigate one mechanism among many that may contribute to this process, a key factor being the sensitivity of learners to the statistical distribution of linguistic elements. As well as being necessary for learning word meanings, the acquisition of anchor word forms facilitates the segmentation of an acoustic stream through other mechanisms. In our experiments some salient one-syllable word forms are learnt by a humanoid robot in real-time interactions with naive participants. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0038236

Interactive Language Learning by Robots: The Transition from Babbling to Word Forms

http://www.analytictech.com/geneva97/thematic.htm The basic purpose of thematic coding (or "tagging") is data retrieval. It is used to classify text according to theme, so that later on, when doing analysis, it easy to retrieve all passages that relate to a given topic. The essence of thematic coding is classification. Consider, for example, the following passage:

Thematic Coding

http://www.rene-pickhardt.de/typology-using-neo4j-wins-2-awards-at-the-german-federal-competition-young-scientists/ Two days ago I arrived in Erfurt in order to visit the federal competition young scientists (Jugend Forscht) . I reported about the project typology by Till Speicher and Paul Wagner which I supervised over the last half year and which already won many awards . Saturday night they have already won a special award donated by the Gesellschaft fuer Informatik this award has the title “special award for a contribution which demonstrates particularly the usefulness of computer science for Society.” (Sonderpreis fuer eine Arbeit, die in besonderer Art und Weise den Nutzen der Informatik verdeutlicht.)

Typology using neo4j wins 2 awards at the German federal competition young scientists.

http://www.theatlantic.com/entertainment/archive/2012/04/can-the-computers-at-narrative-science-replace-paid-writers/255631/ A look at new software that could transform journalism AP Images In a few short years, we've learned to delegate all manner of tasks to computers. For music recommendations or driving directions or academic scouring, we readily turn to our clever machines.

Can the Computers at Narrative Science Replace Paid Writers? - Joe Fassler - Entertainment

About Wordnik

http://www.wordnik.com/about What is Wordnik? Wordnik is a new way to discover meaning. This page will give you a quick overview of what you can do, learn, and share with Wordnik.
MBSP is a text analysis system based on the TiMBL and MBT memory based learning applications developed at CLiPS and ILK . It provides tools for Tokenization and Sentence Splitting, Part of Speech Tagging, Chunking, Lemmatization, Relation Finding and Prepositional Phrase Attachment. The general English version of MBSP has been trained on data from the Wall Street Journal corpus. Download Documentation http://www.clips.ua.ac.be/pages/MBSP

MBSP for Python | CLiPS

Projects | CLiPS

The AMiCA (“Automatic Monitoring for Cyberspace Applications”) project aims to mine relevant social media (blogs, chat rooms, and social networking sites) and collect, analyse, and integrate large amounts of information using text and image analysis. The ultimate goal is to trace harmful content, contact, or conduct in an automatic way. Essentially, we take a cross-media mining approach that allows us to detect risks “on-the-fly”. When critical situations are detected (e.g. a very... <p style="text-align:right;color:#A8A8A8"></p> http://www.clips.ua.ac.be/projects
( by Eric Forsyth, Jane Lin, and Craig Martell ) License and Legal Issues This corpus is distributed solely for non-commercial, non-profit educational and research use. It is a derivative compilation work of multiple works whose copyrights are held by the respective original authors. How to get the NPS Chat Corpus The NPS Chat Corpus is part of the Natural Language Toolkit ( NLTK ) distribution.

faculty.nps.edu/cmartell/NPSChat.htm

http://faculty.nps.edu/cmartell/NPSChat.htm
Corpora Survey Note: This survey is based on my (forthcoming) chapter "Well-known and influential corpora", written for A. Lüdeling, M.

Corpus Based Language Studies

At the Brains, Minds, and Machines symposium held during MIT's 150th birthday party, Technology Review reports that Prof. Noam Chomsky derided researchers in machine learning who use purely statistical methods to produce behavior that mimics something in the world, but who don't try to understand the meaning of that behavior. The transcript is now available, so let's quote Chomsky himself: It's true there's been a lot of work on trying to apply statistical models to various linguistic problems.

On Chomsky and the Two Cultures of Statistical Learning

Phone to track emotional behaviour

29 September 2010 Last updated at 11:20 ET The software has been developed for smart phones A system which allows psychologists to track people's emotional behaviour through their phones has been successfully road-tested by scientists. St Andrews University researchers have developed in-built phone sensors to work out how people's emotions are influenced by their surroundings. The system, EmotionSense, uses sensors and speech-recognition software.
PAPERS-nlp

This course is designed to introduce students to the fundamental concepts and ideas in natural language processing (NLP), and to get them up to speed with current research in the area. It develops an in-depth understanding of both the algorithms available for the processing of linguistic information and the underlying computational properties of natural languages. Wordlevel, syntactic, and semantic processing from both a linguistic and an algorithmic perspective are considered. The focus is on modern quantitative techniques in NLP: using large corpora, statistical models for acquisition, disambiguation, and parsing. Also, it examines and constructs representative systems.

School of Engineering - Stanford Engineering Everywhere

Structured Prediction Problems in Natural Language Processing

Modeling language at the syntactic or semantic level is a key problem in natural language processing, and involves a challenging set of structured prediction problems. In this talk I'll describe work on machine learning approaches for syntax and semantics, with a particular focus on lexicalized grammar formalisms such as dependency grammars, tree adjoining grammars, and categorial grammars. I'll address key issues in the following areas: 1) the design of learning algorithms for structured linguistic data;
Visual word recognition

Natural language processing