Data science (aka Data mining)

TwitterFacebook
Get flash to fully experience Pearltrees

All entries

About: ELKI is a framework for implementing data-mining algorithms with support for index structures, that includes a wide variety of clustering and outlier detection methods. Changes: http://mloss.org/software/
The book has now been published by Cambridge University Press.

Mining of Massive Datasets

http://infolab.stanford.edu/~ullman/mmds.html
Datasets for http://www.datadotgc.ca/. DataDotGC, which launched, in February 2010, is a Canadian, citizen-led effort to promote open data and help share data that has already been... http://thedatahub.org/

Home - CKAN - the Data Hub

https://opendata.socrata.com/ 2011 Report to Congress on White House Staff Government whitehouse, salary, government, congress, ... Since 1995, the White House has been required to deliver a report to Congress listing the title and salary of every White House Office employee.

Socrata | Making Data Social

Visualization

Toolboxes

Datasets

Google Data Science tools

Data pre-processing and cleansing

http://code.google.com/p/oluolu/

oluolu - Project Hosting on Google Code

Oluolu is a open source query log mining tool which works on Hadoop.
http://www.clips.ua.ac.be/pages/pattern

Pattern | CLiPS

Pattern is a web mining module for the Python programming language. It bundles tools for data retrieval (Google + Twitter + Wikipedia API, web spider, HTML DOM parser), text analysis (rule-based shallow parser, WordNet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics), clustering and classification (k-means, KNN, SVM), and data visualization (graph networks).
http://www.sciencemag.org/site/special/data/

Special Online Collection: Dealing with Data

In the 11 February 2011 issue, Science joins with colleagues from Science Signaling , Science Translational Medicine , and Science Careers to provide a broad look at the issues surrounding the increasingly huge influx of research data.