background preloader


Pattern is a web mining module for the Python programming language. It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and <canvas> visualization. The module is free, well-document and bundled with 50+ examples and 350+ unit tests. Download Installation Pattern is written for Python 2.5+ (no support for Python 3 yet). To install Pattern so that the module is available in all Python scripts, from the command line do: > cd pattern-2.6 > python install If you have pip, you can automatically download and install from the PyPi repository: If none of the above works, you can make Python aware of the module in three ways: Quick overview pattern.web pattern.en The pattern.en module is a natural language processing (NLP) toolkit for English. pattern.vector Case studies

Related:  Python Forum ScrapingTools/LibrariesData science tool & case resourcesData

mechanize Stateful programmatic web browsing in Python, after Andy Lester’s Perl module WWW::Mechanize. The examples below are written for a website that does not exist (, so cannot be run. There are also some working examples that you can run. import reimport mechanize br = mechanize.Browser()" follow second link with element text matching regular expressionresponse1 = br.follow_link(text_regex=r"cheese\s*shop", nr=1)assert br.viewing_html()print br.title()print response1.geturl()print # headersprint # body

MBSP for Python MBSP is a text analysis system based on the TiMBL and MBT memory based learning applications developed at CLiPS and ILK. It provides tools for Tokenization and Sentence Splitting, Part of Speech Tagging, Chunking, Lemmatization, Relation Finding and Prepositional Phrase Attachment. The general English version of MBSP has been trained on data from the Wall Street Journal corpus. Download Documentation Introduction The IATI Standard Alpha version Please note that the Datastore is currently in its first release. Therefore, data queries may sometimes result in unexpected results. Essential Resources: Mapping applications, frameworks and libraries This is part of a series of posts to share with readers a useful collection of some of the most important, effective and practical data visualisation resources. This post presents the many different options for visualisation spacial data. Please note, I may not have personally used all the packages or tools presented but have seen sufficient evidence of their value from other sources.

oluolu - Project Hosting on Google Code Oluolu is a open source query log mining tool which works on Hadoop. This tool provides resources to add new features to search engines. Concretely Oluolu supports automatic dictionary creation such as spelling correction, context queries or frequent query n-grams from query log data.

Category:LanguageBindings -> PySide EnglishEspañolMagyarItalian한국어日本語 Welcome to the PySide documentation wiki page. The PySide project provides LGPL-licensed Python bindings for the Qt. 100 days of web mining In this experiment, we collected Google News stories at regular 1-hour intervals between November 22, 2010, and March 8, 2011, resulting in a set of 6,405 news stories. We grouped these per day and then determined the top daily keywords using tf-idf, a measurement of a word's uniqueness or importance. For example: if the word news is mentioned every day, it is not particularly unique at any single given day.

An Interactive Infographic Maps The Future Of Emerging Technology Can speculation about the future of technology serve as a measuring stick for what we create today? That’s the idea behind Envisioning Technology's massive infographic (PDF), which maps the future of emerging technologies on a loose timeline between now and 2040. Click to enlarge. On it you’ll find predictions about everything from artificial intelligence and robotics to geoengineering and energy.

Related:  NLP