background preloader

Natural language processing

Facebook Twitter

True Knowledge — The Internet Answer Engine. The Story of Evi Evi was founded in August 2005, originally under the name of True Knowledge, with the mission of powering a new kind of search experience where users can access the world’s knowledge simply by asking for the information they need in a way that is completely natural.

True Knowledge — The Internet Answer Engine

The True Knowledge internet answer engine was launched in 2007 to excellent response from users who were not only able to access the wealth of information Evi could provide, but were able to contribute directly to the ever growing database of facts. The availability of the True Knowledge API resulted in numerous apps and services being powered by the True Knowledge engine. In January 2010 we logged one million users in a single month, which had grown to five million unique users in the month of November. In 2011 development began on Evi, a brand new AI that advanced on the technology within the True Knowledge platform and which users would be able to interact with via her own mobile app.

What we do? Grammatical Features - Aspect. Anna Kibort 1.

Grammatical Features - Aspect

What is 'aspect' The term 'aspect' designates the perspective taken on the internal temporal organisation of the situation, and so 'aspects' distinguish different ways of viewing the internal temporal constituency of the same situation (Comrie 1976:3ff, after Holt 1943:6; Bybee 2003:157). The 'situation' is meant here as general term covering events, processes, states, etc., as expressed by the verb phrase or the construction. Unlike tense, which is situation-external time, aspect is situation-internal and non-deictic, as it is not concerned with relating the time of the situation to any other time point.

Aspectual meaning of a clause can be broken up into two independent aspectual components (Smith 1991/1997): Aspectual viewpoint - this is the temporal perspective from which the situation is presented. Aspectual meaning of a clause results from the interaction of aspectual viewpoint and situation type. Jump to top of page/ top of section 2. 3. 4. 100 days of web mining. In this experiment, we collected Google News stories at regular 1-hour intervals between November 22, 2010, and March 8, 2011, resulting in a set of 6,405 news stories.

100 days of web mining

We grouped these per day and then determined the top daily keywords using tf-idf, a measurement of a word's uniqueness or importance. For example: if the word news is mentioned every day, it is not particularly unique at any single given day. To set up the experiment we used the Pattern web mining module for Python.The basic script is simple enough: Your code will probably have some preprocessing steps to save and load the mined news updates.

In the image below, important words (i.e., events) that occured across multiple days are highlighted (we took a word's document frequency as an indication). See full size image Simultaneously, we mined Twitter messages containing the words I love or I hate – 35,784 love-tweets and 35,212 hate-tweets in total. Daily drudge. Pattern. Pattern is a web mining module for the Python programming language.

Pattern

It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and <canvas> visualization. The module is free, well-document and bundled with 50+ examples and 350+ unit tests.

Download Installation Pattern is written for Python 2.5+ (no support for Python 3 yet). To install Pattern so that the module is available in all Python scripts, from the command line do: > cd pattern-2.6 > python setup.py install If you have pip, you can automatically download and install from the PyPi repository:

Analogy as the Core of Cognition. Terminology Extraction. Introduction Terminology is the sum of the terms which identify a specific topic.

Terminology Extraction

Extracting terminology is the process of extracting terminology from a text. The idea is to compare the frequency of words in a given document with their frequency in the language. Words which appear very frequently in the document but rarely in the language are probably terms. Technology It uses Poisson statistics, the Maximum Likelihood Estimation and Inverse Document Frequency between the frequency of words in a given document and a generic corpus of 100 million words per language. Why have we developed this? Translated has developed this technology to help its translators to be aware of the difficulties in a document and to simplify the process of creating glossaries.

We also use it to improve search results in traditional search engines (es. I want it! If you are interested in this technology, please read more on Translated Labs and our services for natural language processing. I could do better! EUR-Lex. NLTK Home (Natural Language Toolkit)

Latent Dirichlet Allocation