background preloader

Statistics

Facebook Twitter

MOST POPULAR INFOGRAPHICS. Any set of figures needs adjusting before it can be usefully reported | Ben Goldacre. Fox News was excited: "Unplanned children develop more slowly, study finds. " The Telegraph was equally shrill in its headline ("IVF children have bigger vocabulary than unplanned children"). And the British Medical Journal press release drove it all: "Children born after an unwanted pregnancy are slower to develop. " The last two, at least, made a good effort to explain that this effect disappeared when the researchers accounted for social and demographic factors. But was there ever any point in reporting the raw finding, from before this correction was made?

I will now demonstrate, with a nerdy table illustration, how you correct for things such as social and demographic factors. You'll have to pay attention, because this is a tricky concept; but at the end, when the mystery is gone, you will see why reporting the unadjusted figures as the finding, especially in a headline, is silly and wrong. Correcting for an extra factor is best understood by doing something called "stratification". Data journalism and data visualization from the Datablog | News. My top ten data.gov.uk datasets - a guest post by Simon Rogers. Data.gov.uk has become one of the finest national open data initiatives in the world - it now has more data than the mighty data.gov in the US, with 4,223 datasets, compared to 2,876 over the Atlantic. It's not perfect - far too many links take you to front pages on other sites, rather than the data itself.

It could also do with more help for the less-experienced user, witness the multitude of downloads on the Treasury's Combined Online Information System (COINS) dataset ( But nevertheless, what a resource. And where it really comes into its own is in the publication of immense datasets previously kept within the confines of the civil service, many of which show highly local data.

So, if I had to pick my top ten data.gov.uk datasets here is where I would start: 1) National Public Transport Data Repository (NPTDR) If you want a complete dataset, look no further. 4) England in dog mess Really. Marathon 2010. A 24-hour student data visualization competition Click here to download Visualizing Marathon 2010 Poster. Welcome Visualizing.org is proud to have held the inaugural Visualizing Marathon: a 24-hour student data visualization competition. Inspired by robotics competitions and science fairs, the Marathon was created to give design students an opportunity to collaborate and use design to help tackle real-world issues. It was a fun and intense weekend full of speakers, performances, and of course, the actual competition. The Marathon was held over the Oct 22-24 weekend at Eyebeam in New York. Visualizing Marathon 2010 NYC. The Challenge Set by design leaders at the Cooper Hewitt's Why Design Now? The Jury Marathon Opening The Marathon officially opened on Friday, October 22nd and was kicked off with a night of speaker presentations and a party that included an interactive VJ performance.

Speakers: Agenda Future Marathons. Journalism in the Age of Data: A Video Report on Data Visualization by Geoff McGhee. Let's Intersect! Conditional Risk. Doctor Who: Every single journey through time detailed detailed by Information is Beautiful. As a spreadsheet | Television & radio. Doctor Who time travels of the Doctor: Information is Beautiful gets the data - what can you do? Illustration: David McCandless for the Guardian Last year, I created a visualisation of Time travel in TV & Films.

You know. Star Trek, Back To The Future, Planet Of The Apes etc. I deliberately left out Doctor Who. It would've spaghetti-fied an already pretty hectic image. All the time, though, I really wanted to do a mega-visualisation of all of the Time Lord's journeys. I like Doctor Who. A stream of people came forward - programmers, researchers, fans (even a few people who sent pictures of themselves in floppy hats and homemade sonic screwdrivers) Here is the fruit of our labour - a list of every single journey through time made by the doctor, featuring start year, end year, and location. Download a copy of this spreadsheet It's twinned with a previous datablog dataset - Every Doctor Who Villain How many times do you think the Doctor has travelled through time?

Check out the data to see. Escher-like internet map could speed online traffic - tech - 08 September 2010. A novel map of the internet created by Marián Boguñá and colleagues at the University of Barcelona, Spain, could help make network glitches a thing of the past. Boguñá squeezed the entire network into a disc using hyperbolic geometry, more familiar to us through the circular mosaic-like artworks of M. C. Escher. Each square on the map is an "autonomous system" – a section of the network managed by a single body such as a national government or a service provider. Like all good cartographers, Boguñá's team hopes their map will help speed up navigation.

Network coordinates Boguñá's map could do away with all this by providing "coordinates" for every system on the network. Although the map simply shows the number of connections between each autonomous system, the geography of the hyperbolic internet map often reflects that of the real world – for example, a number of western European nations are clustered in one sector. Journal reference: Nature Communications, DOI: 10.1038/ncomms1063) Statistical Analysis - Stack Exchange.

Research tips. The latest issue of the IJF is a bumper issue with over 500 pages of forecasting insights. The GEFCom2014 papers are included in a special section on probabilistic energy forecasting, guest edited by Tao Hong and Pierre Pinson. This is a major milestone in energy forecasting research with the focus on probabilistic forecasting and forecast evaluation done using a quantile scoring method. Only a few years ago I was having to explain to energy professionals why you couldn’t use a MAPE to evaluate a percentile forecast. With this special section, we now have a tutorial review on probabilistic electric load forecasting by Tao Hong and Shu Fan, which should help everyone get up to speed with current forecasting approaches, evaluation methods and common misunderstandings. The section also contains a large number of very high quality articles showing how to do state-of-the-art density forecasting for electricity load, electricity price, solar and wind power.

Statistics How To.

Information is beautiful

Statistics blogs. Statistical modeling, causal inference, and social science: Blog of Andrew Gelman's research group, featuring Bayesian statistics, multilevel modeling, causal inference, political science, decision theory, public health, sociology, economics, and literatu. How to visualize data with cartoonish faces ala Chernoff. FlowingData reader Chris asks: I was wondering, have you ever considered doing a Chernoff faces tutorial for R?

I think Chernoff faces are pretty interesting and I haven't seen much about them on the web. This wasn't the first time someone's asked how to make Chernoff faces, so I did a quick search. Guess what. There's an R package for that. This tutorial describes how to apply Chernoff faces to your own data. Chernoff Faces The point of Chernoff faces is to display multiple variables at once by positioning parts of the human face, such as ears, hair, eyes, and nose, based on numbers in a dataset. 1. Download R Like in previous tutorials, we'll be using R (surprise, surprise), the software environment for statistical computing and graphics, to make our Chernoff faces, so if you haven't already, download and install R first before moving on.

Step 1. Once you've opened up R, the first thing we need to do is install the aplpack (Another Plot Package) package by Peter Wolf. Step 2. Crime[1:6,] Why visualise data? | seeing data. Why visualise data? In the introduction to his classic text, The Visual Display of Quantitative Information, Edward Tufte answers this question in three words. “Graphics reveal data”. To illustrate his point Tufte asks the reader to examine four datasets of eleven (x, y) datapoints, collectively known as Anscombe’s quartet.

I’ve reproduced them in the figure below. The datasets that constitute Anscombe’s quartet share identical basic statistical properties and can be described by the same linear model. When the data is graphed their characteristics and differences become immediately apparent. Visualisation renders complex data accessible. This blog will examine the state of data visualisation in New Zealand and abroad. Problem solving flowchart (slightly crass) Stochastic. Randomness. There was a query on the SAS mailing list today - someone got inconsistent results for confidence intervals between Excel and SAS. In Excel, they were using the confidence() function, which I'd not come across before.

And I'm glad about that. See, to calculate a confidence interval, you multiply the standard error of the distribution for the critical value from the t-distribution. You can find that value using (say) R, with the qt() function or Excel, with the tinv() function. The t-distribution approximates the normal distribution as the sample size increases - you need a sample size of infinity for them to be exactly the same, but if the same size is large enough, then it's close. With the normal distribution, if you want a 95% confidence interval, the critical value is 1.96, which is so close to 2 that you can pretty much us 2 and get away with it.

(Around 95% of cases lie within 2 SDs of the mean in a normal distribution). However, this person had a sample size of 6.