background preloader

Outils, ressources

Facebook Twitter

Data Science Toolkit. Think Stats: Probability and Statistics for Programmers. By Allen B. Downey, published by O'Reilly Media. The second edition of this book is available now. We recommend you switch to the new (and improved) version! Order Think Stats from Amazon.com Download this book in PDF. Read this book online. Code examples and solutions are available from this zip file. Download data files for use with the book. Read the related blog Probably Overthinking It. Description Think Stats is an introduction to Probability and Statistics for Python programmers. Think Stats emphasizes simple techniques you can use to explore real data sets and answer interesting questions.

This book is under the Creative Commons Attribution-NonCommercial 3.0 Unported License, which means that you are free to copy, distribute, and modify it, as long as you attribute the work and don't use it for commercial purposes. Other Free Books by Allen Downey are available from Green Tea Press. Data Publica.

Portail de la statistique publique. Public Data Explorer. Indicateurs de développement humain Rapport sur le développement humain 2013, Programme des Nations Unies pour le développement Les données utilisées pour calculer l'Indice de développement humain (IDH) et autres indices composites présentés dans le Rapport sur le développement humain ... Eurostat, Indicateurs démographiques Eurostat Indicateurs démographiques annuels. Chômage en Europe (données mensuelles) données sur le chômage harmonisé pour les pays européens. Salaire minimum en Europe Salaire mensuel brut minimum en euros ou parités de pouvoir d'achat, données semi-annuelles. Dette publique en Europe Statistiques sur les finances publiques des pays européens. INSEE.

Gapminder: Unveiling the beauty of statistics for a fact based world view. Freebase. The R Project for Statistical Computing. Rewire the web. Data Scraping Wikipedia with Google Spreadsheets. Prompted in part by a presentation I have to give tomorrow as an OU eLearning community session (I hope some folks turn up – the 90 minute session on Mashing Up the PLE – RSS edition is the only reason I’m going in…), and in part by Scott Leslie’s compelling programme for a similar duration Mashing Up your own PLE session (scene scetting here: Hunting the Wily “PLE”), I started having a tinker with using Google spreadsheets as for data table screenscraping. So here’s a quick summary of (part of) what I found I could do. The Google spreadsheet function =importHTML(“”,”table”,N) will scrape a table from an HTML web page into a Google spreadsheet.

The URL of the target web page, and the target table element both need to be in double quotes. The number N identifies the N’th table in the page (counting starts at 0) as the target table for data scraping. Grab the URL, fire up a new Google spreadsheet, and satrt to enter the formula “=importHTML” into one of the cells: Why CSV? Here’s why: Many Eyes. OpenHeatMap. Playing with heat-mapping UK data on OpenHeatMap. Last night OpenHeatMap creator Pete Warden announced that the tool now allowed you to visualise UK data . I’ve been gleefully playing with the heat-mapping tool today and thought I’d share some pointers on visualising data on a map. This is not a tutorial for OpenHeatMap – Pete’s done a great job of that himself (video below) – but rather an outline of the steps to get some map-ready data in the first place. [youtube: 1. Find a dataset to visualise.

You firstly need data that fits the geographical areas supported by OpenHeatMap (countries, constituencies, local authorities, districts and counties), and which suits geographical visualisation. My first stop was the data.gov.uk RSS feed to see what recent datasets had been released, but you could also do advanced searches for “unemployment by county” etc. if you are looking for something specific to visualise. Helpfully, each dataset description includes a field on “Geographical granularity”. 2. 3. Scraping for Journalism: A Guide for Collecting Data. Photo by Dan Nguyen/ProPublica Our Dollars for Docs news application lets readers search pharmaceutical company payments to doctors.

We’ve written a series of how-to guides explaining how we collected the data. Most of the techniques are within the ability of the moderately experienced programmer. The most difficult-to-scrape site was actually a previous Adobe Flash incarnation of Eli Lilly’s disclosure site. Lilly has since released their data in PDF format. These recipes may be most helpful to journalists who are trying to learn programming and already know the basics. If you are a complete novice and have no short-term plan to learn how to code, it may still be worth your time to find out about what it takes to gather data by scraping web sites -- so you know what you’re asking for if you end up hiring someone to do the technical work for you. The tools With the exception of Adobe Acrobat Pro, all of the tools we discuss in these guides are free and open-source. A Guide to the Guides. Data Visualization. Tutorials. How to Make a State Grid Map in R Something of a cross between a reference table and a map, the state grid provides equal space to each state and a semblance of the country to quickly pick out individual states.

How to Make Animated Line Charts in R Sometimes it's useful to animate the multiple lines instead of showing them all at once. How to Make a Multi-line Step Chart in R For the times your data represents immediate changes in value. Symbols-based Charts to Show Counts in R Add visual weight by using individual items to show counts. Introducing a Course for Mapping in R Mapping geographic data in R can be tricky, because there are so many ways to complete separate tasks. How to Edit R Charts in Adobe Illustrator A detailed guide for R users who want to polish their charts in the popular graphic design app for readability and aesthetics. How to Make an Animated Map in R, Part 4 In the the last part of the four-part series, you make a longer animation with more data and annotate.

How to Make Bubble Charts. A bubble chart can also just be straight up proportionally sized bubbles, but here we're going to cover how to create the variety that is like a scatterplot with a third, bubbly dimension. The advantage of this chart type is that it lets you compare three variables at once. One is on the x-axis, one is on the y-axis, and the third is represented by area size of bubbles. Have a look at the final chart to see what we're making. Step 0. Download R We're going to use R to do this, so download that before moving on. It's free and open-source, so you have nothing to lose. Step 1. Assuming you already have R open, the first thing we'll do is load the data. Okay, moving on. You're telling R to download the data and read it as a comma-delimited file with a header.

Step 2. Now we can get right to drawing circles with the symbols() command. Run the line of code above, and you'll get this: Circles incorrectly sized by radius instead of area. All done, right? Step 3. Area of circle = πr2 Yay. Step 4.