Data science (aka Data mining)

Facebook Twitter

Data mining, forecasting and bioinformatics competitions on Kaggle. All entries. Mining of Massive Datasets. The book has now been published by Cambridge University Press.

Mining of Massive Datasets

The publisher is offering a 20% discount to anyone who buys the hardcopy Here. By agreement with the publisher, you can still download it free from this page. Cambridge Press does, however, retain copyright on the work, and we expect that you will obtain their permission and acknowledge our authorship if you republish parts or all of it. Haystack Group. Home - CKAN. Making Data Social.


Toolboxes. Datasets. Google Data Science tools. Data pre-processing and cleansing. Oluolu - Project Hosting on Google Code. Oluolu is a open source query log mining tool which works on Hadoop.

oluolu - Project Hosting on Google Code

This tool provides resources to add new features to search engines. Concretely Oluolu supports automatic dictionary creation such as spelling correction, context queries or frequent query n-grams from query log data. The dictionaries are applied to search engines to add features such as 'did you mean' or 'related keyword suggestion' service in search engines. 2011-11-16 oluolu 0.2.1 released. Pattern. Pattern is a web mining module for the Python programming language.


It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and <canvas> visualization. The module is free, well-document and bundled with 50+ examples and 350+ unit tests. Download Installation Pattern is written for Python 2.5+ (no support for Python 3 yet). Special Online Collection: Dealing with Data.