background preloader

Outils, ressources

Facebook Twitter

Data Science Toolkit. Think Stats: Probability and Statistics for Programmers. By Allen B.

Think Stats: Probability and Statistics for Programmers

Downey, published by O'Reilly Media. Order Think Stats from Download this book in PDF. Data Publica. Portail de la statistique publique. Public Data Explorer. Indicateurs de développement humain Rapport sur le développement humain 2013, Programme des Nations Unies pour le développement Les données utilisées pour calculer l'Indice de développement humain (IDH) et autres indices composites présentés dans le Rapport sur le développement humain ...

Public Data Explorer

INSEE. Gapminder: Unveiling the beauty of statistics for a fact based world view. Freebase. The R Project for Statistical Computing. Rewire the web. We are introducing a new module, the XPATH Fetch Page.

Rewire the web

We are also going to deprecate the Fetch Page module at the end of June. So please convert your existing Pipes that use the Fetch Page module to the XPATH Fetch Page module...more In this talk from YUIConf 2011, we demonstrate the features of the Yahoo! Pipes editor and explain how you can use Pipes and YQL to power your web apps, create mashups, and more. Frontend Changes String builder, Item builder and URL builder modules now have a max of 40 item per module, increased from 10. Data Scraping Wikipedia with Google Spreadsheets. Prompted in part by a presentation I have to give tomorrow as an OU eLearning community session (I hope some folks turn up – the 90 minute session on Mashing Up the PLE – RSS edition is the only reason I’m going in…), and in part by Scott Leslie’s compelling programme for a similar duration Mashing Up your own PLE session (scene scetting here: Hunting the Wily “PLE”), I started having a tinker with using Google spreadsheets as for data table screenscraping.

Data Scraping Wikipedia with Google Spreadsheets

So here’s a quick summary of (part of) what I found I could do. The Google spreadsheet function =importHTML(“”,”table”,N) will scrape a table from an HTML web page into a Google spreadsheet. The URL of the target web page, and the target table element both need to be in double quotes. The number N identifies the N’th table in the page (counting starts at 0) as the target table for data scraping. Grab the URL, fire up a new Google spreadsheet, and satrt to enter the formula “=importHTML” into one of the cells: Many Eyes. OpenHeatMap. Playing with heat-mapping UK data on OpenHeatMap. Last night OpenHeatMap creator Pete Warden announced that the tool now allowed you to visualise UK data . I’ve been gleefully playing with the heat-mapping tool today and thought I’d share some pointers on visualising data on a map.

This is not a tutorial for OpenHeatMap – Pete’s done a great job of that himself (video below) – but rather an outline of the steps to get some map-ready data in the first place. [youtube: 1. Find a dataset to visualise. You firstly need data that fits the geographical areas supported by OpenHeatMap (countries, constituencies, local authorities, districts and counties), and which suits geographical visualisation. My first stop was the RSS feed to see what recent datasets had been released, but you could also do advanced searches for “unemployment by county” etc. if you are looking for something specific to visualise. Helpfully, each dataset description includes a field on “Geographical granularity”. Scraping for Journalism: A Guide for Collecting Data.

Photo by Dan Nguyen/ProPublica Our Dollars for Docs news application lets readers search pharmaceutical company payments to doctors.

Scraping for Journalism: A Guide for Collecting Data

We’ve written a series of how-to guides explaining how we collected the data. Most of the techniques are within the ability of the moderately experienced programmer. The most difficult-to-scrape site was actually a previous Adobe Flash incarnation of Eli Lilly’s disclosure site. Lilly has since released their data in PDF format. These recipes may be most helpful to journalists who are trying to learn programming and already know the basics. Data Visualization. Definition: Tools that help users discern patterns in data--dynamic graphs, charts, maps, plots, etc.

Data Visualization

Tools: Chartle: Create simple interactive charts. (Web-based, free)ColorBrewer: Interactive tool for selecting most appropriate colors for maps and other visualizations (Web-based, free)Dundas: produces digital dashboard visualizations (Web-based, commercial) Exhibit: "enables you to create html pages with dynamic exhibits of data collections without resorting to complex database and server-side technologies. The collections can be searched and browsed using faceted browsing. Resources: Tutorials. How to Make and Use Bar Charts in R The chart type seems simple enough, but there sure are a lot of bad ones out there.


Get yourself out of default mode. How to Make Bubble Charts. A bubble chart can also just be straight up proportionally sized bubbles, but here we're going to cover how to create the variety that is like a scatterplot with a third, bubbly dimension.

How to Make Bubble Charts

The advantage of this chart type is that it lets you compare three variables at once. One is on the x-axis, one is on the y-axis, and the third is represented by area size of bubbles. Have a look at the final chart to see what we're making. Step 0. Download R.