Data science (aka Data mining)
The book has now been published by Cambridge University Press.
Google Data Science tools
Data pre-processing and cleansing
oluolu - Project Hosting on Google Code Oluolu is a open source query log mining tool which works on Hadoop.
Pattern is a web mining module for the Python programming language. It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and <canvas> visualization.
Special Online Collection: Dealing with Data