background preloader

NLP

Facebook Twitter

Blog -Create your own machine learning powered RSS... - Algorithmia Blog. Enough Machine Learning to Make Hacker News Readable Again. Wikipedia Miner. Research Mobile - Natural Language Processing From Scratch. Alyona Medelyan: Understanding human language with Python. Teaching machines to read between the lines (and a new corpus with entity salience annotations) Posted by Christian Szegedy, Software Engineer The ImageNet large-scale visual recognition challenge (ILSVRC) is the largest academic challenge in computer vision, held annually to test state-of-the-art technology in image understanding, both in the sense of recognizing objects in images and locating where they are.

Teaching machines to read between the lines (and a new corpus with entity salience annotations)

Participants in the competition include leading academic institutions and industry labs. In 2012 it was won by DNNResearch using the convolutional neural network approach described in the now-seminal paper by Krizhevsky et al.[4] In this year’s challenge, team GoogLeNet (named in homage to LeNet, Yann LeCun's influential convolutional network) placed first in the classification and detection (with extra training data) tasks, doubling the quality on both tasks over last year's results.

HTML Text Extraction

- Best NLP books. Do you know any Natural Language Processing best-sellers?

- Best NLP books

Natural Language Processing (NLP) is a vast field and what is more important - today it’s a fast-moving area of research. In this post we propose you to have a look at our review of the most interesting books about NLP. We know that every researcher and scientist must have a good theoretical foundation. That’s why we are recommending these books for your consideration and discussion. 1. Free version of the 1st edition here 2. CS276: Information Retrieval and Web Search. CS 276 / LING 286: Information Retrieval and Web Search Lecture: 3 units, Tu/Th 4:15-5:30 at NVIDIA Auditorium (available online through SCPD) Staff e-mail: cs276-spr1314-staff@lists.stanford.edu Course Description Basic and advanced techniques for text-based information systems: efficient text indexing; Boolean and vector space retrieval models; evaluation and interface issues; Web search including crawling, link-based algorithms, and Web metadata; text/Web clustering, classification; text mining.

CS276: Information Retrieval and Web Search

Policies Information (grading, etc.) Prerequisites: CS 107, CS 109, CS 161. Online Resources. Untitled. Representing words as high dimensional vectors Making computers understand human language is an active area of research, called Natural Language Processing (NLP).

untitled

A widely used method of NLP research involves the statistical modeling of N-grams ( which are collected from freely available text corpora, and treated as single “atomic” units. While this has the benefit of being able to create simple models that can be trained on large amounts of data, it suffers when a large dataset isn’t available, such as high quality transcribed speech data for automatic speech recognition, or when one wants to have a notion of similarities between words. In the paper Efficient Estimation of Word Representations in Vector Space ( Googlers Tomas Mikolov, +Kai Chen, +Greg Corrado, and +Jeff Dean describe recent progress being made on the application of neural networks to understanding the human language. Research. Almost full list of publications Selected Recent Papers.

Research

Contenu — Notes de cours IFT6266 Hiver 2012. Deep Learning Tutorial - NAACL 2013. A tutorial given at NAACL HLT 2013.

Deep Learning Tutorial - NAACL 2013

Based on an earlier tutorial given at ACL 2012 by Richard Socher, Yoshua Bengio, and Christopher Manning. NLP, Grammar, and Other Communication concepts. While I’ve become somewhat bemused during the course of my research with the plethora of different linguistic theories that have come and gone over the last three decades, not to mention all the “busy work” being carried out by the statistical natural language processing crowd, there remains a hard core of research dedicated to rigorous deterministic methods.

NLP, Grammar, and Other Communication concepts

I just discovered a real gem of this type, an all-platform open source natural language parser that implements PATR, a simple but powerful language for defining the grammar of natural languages. You can download the executables and documentation for your platform of choice, or grab the source code and compile it yourself. It’s the cleanest and easiest to use parsing tool that I’ve ever come across, rivalling the Link Grammar Parser in it’s simplicity, but significantly more powerful and versatile. You can jump straight to the documentation for examples and usage details here: An open source web scraping framework for Python. Pattern. Pattern is a web mining module for the Python programming language.

Pattern

It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and <canvas> visualization. The module is free, well-document and bundled with 50+ examples and 350+ unit tests. Download Installation Pattern is written for Python 2.5+ (no support for Python 3 yet).

To install Pattern so that the module is available in all Python scripts, from the command line do: > cd pattern-2.6 > python setup.py install If you have pip, you can automatically download and install from the PyPi repository: If none of the above works, you can make Python aware of the module in three ways: Quick overview pattern.web pattern.en The pattern.en module is a natural language processing (NLP) toolkit for English.

Pattern.search pattern.vector Case studies.

Twitter

Michael Collins.