Pattern Pattern is a web mining module for the Python programming language. It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and <canvas> visualization. The module is free, well-document and bundled with 50+ examples and 350+ unit tests. Download Installation Pattern is written for Python 2.5+ (no support for Python 3 yet). To install Pattern so that the module is available in all Python scripts, from the command line do: > cd pattern-2.6 > python setup.py install If you have pip, you can automatically download and install from the PyPi repository: If none of the above works, you can make Python aware of the module in three ways: Quick overview pattern.web pattern.en The pattern.en module is a natural language processing (NLP) toolkit for English. pattern.search pattern.vector Case studies
100 days of web mining In this experiment, we collected Google News stories at regular 1-hour intervals between November 22, 2010, and March 8, 2011, resulting in a set of 6,405 news stories. We grouped these per day and then determined the top daily keywords using tf-idf, a measurement of a word's uniqueness or importance. For example: if the word news is mentioned every day, it is not particularly unique at any single given day. To set up the experiment we used the Pattern web mining module for Python.The basic script is simple enough: Your code will probably have some preprocessing steps to save and load the mined news updates. In the image below, important words (i.e., events) that occured across multiple days are highlighted (we took a word's document frequency as an indication). See full size image Simultaneously, we mined Twitter messages containing the words I love or I hate – 35,784 love-tweets and 35,212 hate-tweets in total. Daily drudge Here are the top keywords of hate-tweets grouped by day:
Grammatical Features - Aspect Anna Kibort 1. What is 'aspect' The term 'aspect' designates the perspective taken on the internal temporal organisation of the situation, and so 'aspects' distinguish different ways of viewing the internal temporal constituency of the same situation (Comrie 1976:3ff, after Holt 1943:6; Bybee 2003:157). Aspectual meaning of a clause can be broken up into two independent aspectual components (Smith 1991/1997): Aspectual viewpoint - this is the temporal perspective from which the situation is presented. Aspectual meaning of a clause results from the interaction of aspectual viewpoint and situation type. Jump to top of page/ top of section 2. Aspectual characteristics are coded in a wide range of ways: lexical, derivational, or inflectional; synthetic ('morphological') and analytic ('syntactic'). Verbs tend to have inherent aspectual meaning because the situations described by them tend to have inherent temporal properties. Jump to top of page/ top of section 3. 4.
True Knowledge — The Internet Answer Engine The Story of Evi Evi was founded in August 2005, originally under the name of True Knowledge, with the mission of powering a new kind of search experience where users can access the world’s knowledge simply by asking for the information they need in a way that is completely natural. The True Knowledge internet answer engine was launched in 2007 to excellent response from users who were not only able to access the wealth of information Evi could provide, but were able to contribute directly to the ever growing database of facts. In 2011 development began on Evi, a brand new AI that advanced on the technology within the True Knowledge platform and which users would be able to interact with via her own mobile app. In October 2012, Evi was acquired by Amazon and is proud to now be part of the Amazon group of companies. What we do? Evi’s mission is to help people get what they want and need through our understanding of each user and the world they live in.
Terminology Extraction Introduction Terminology is the sum of the terms which identify a specific topic. Extracting terminology is the process of extracting terminology from a text. The idea is to compare the frequency of words in a given document with their frequency in the language. Technology It uses Poisson statistics, the Maximum Likelihood Estimation and Inverse Document Frequency between the frequency of words in a given document and a generic corpus of 100 million words per language. Why have we developed this? Translated has developed this technology to help its translators to be aware of the difficulties in a document and to simplify the process of creating glossaries. We also use it to improve search results in traditional search engines (es. I want it! If you are interested in this technology, please read more on Translated Labs and our services for natural language processing. I could do better!
NLTK Home (Natural Language Toolkit)