données

TwitterFacebook
Get flash to fully experience Pearltrees
Data Visualization

Hadoop

http://commoncrawl.org/data/

Data | CommonCrawl

Common Crawl produces and maintains a repository of web crawl data that is openly accessible to everyone. The crawl currently covers 6 billion pages and the repository includes valuable metadata. The crawl data is stored on Amazon’s Public Data Sets , allowing it to be bulk downloaded as well as directly accessed for map-reduce processing in EC2.
bayesian

Recently via Twitter I came across “ Gibbs Sampling for the Uninitiated ” by Philip Resnik and Eric Hardisty , a tutorial that shows how to use Gibbs sampling of a Naive Bayes model to estimate the labels on a set of documents. This paper goes through the algebra in great detail and concludes with pseudocode. Resnik and Hardisty do such a good job of making it look easy that I decided to write my own Gibbs sampler. http://cornercases.wordpress.com/2011/10/06/gibbs-sampling-for-the-uninitiated-for-the-uninitiated/

“Gibbs Sampling for the Uninitiated” for the Uninitiated | Corner Cases

This is the home page for the book, "Bayesian Data Analysis," by Andrew Gelman , John B. Carlin , Hal S. Stern , and Donald B.

"Bayesian Data Analysis": page de l'auteur

http://www.stat.columbia.edu/~gelman/book/
r

R Programming

http://www.dataspora.com/2009/02/predictive-analytics-using-r/ (March 26th Update: Video now available) Last night, I moderated our Bay Area R Users Group kick-off event with a panel discussion entitled “The R and Science of Predictive Analytics”, co-located with the Predictive Analytics World conference here in SF.

How Google and Facebook are using R

Z

Open Source

information graphics are visual representations of information , data or knowledge. http://blog.dreamcss.com/graphics-tools/information-graphics-software/

8 useful open source information graphics software