statistics

TwitterFacebook
Get flash to fully experience Pearltrees

Any set of figures needs adjusting before it can be usefully reported | Ben Goldacre | Comment is free

http://www.guardian.co.uk/commentisfree/2011/aug/05/bad-science-adjusting-figures Three tables showing rates of lung cancer among drinkers and non-drinkers, then adjusted for smokers and non-smokers Fox News was excited: " Unplanned children develop more slowly, study finds. " The Telegraph was equally shrill in its headline (" IVF children have bigger vocabulary than unplanned children "). And the British Medical Journal press release drove it all: " Children born after an unwanted pregnancy are slower to develop ." The last two, at least, made a good effort to explain that this effect disappeared when the researchers accounted for social and demographic factors. But was there ever any point in reporting the raw finding, from before this correction was made? I will now demonstrate, with a nerdy table illustration, how you correct for things such as social and demographic factors.
Data.gov.uk has become one of the finest national open data initiatives in the world - it now has more data than the mighty data.gov in the US, with 4,223 datasets, compared to 2,876 over the Atlantic. It's not perfect - far too many links take you to front pages on other sites, rather than the data itself. It could also do with more help for the less-experienced user, witness the multitude of downloads on the Treasury's Combined Online Information System (COINS) dataset ( http://data.gov.uk/dataset/coins ). But nevertheless, what a resource. And where it really comes into its own is in the publication of immense datasets previously kept within the confines of the civil service, many of which show highly local data. So, if I had to pick my top ten data.gov.uk datasets here is where I would start: http://data.gov.uk/blog/my-top-ten-datagovuk-datasets-guest-post-simon-rogers

My top ten data.gov.uk datasets - a guest post by Simon Rogers

http://www.visualizing.org/marathon2010

Marathon 2010

A 24-hour student data visualization competition Click here to download Visualizing Marathon 2010 Poster. Welcome Visualizing.org is proud to have held the inaugural Visualizing Marathon: a 24-hour student data visualization competition. Inspired by robotics competitions and science fairs, the Marathon was created to give design students an opportunity to collaborate and use design to help tackle real-world issues.
Last week, a series of media headlines suggested that immigrants were taking jobs away from British people as the economy enters recovery. These stories were based in part on new ONS statistics, bolstered in some of the papers by reference to a report from MigrationWatch which purported to show that recent immigration to the UK has caused higher unemployment. I wrote here about why these stories were misleading, and explained that the MigrationWatch report had failed to demonstrate the causal link claimed by the headline on its press release ("Immigration has damaged employment prospects for British workers", MigrationWatch press release , 12 August).

Why MigrationWatch is wrong — a plea for a more robust debate on immigration

http://www.newstatesman.com/blogs/the-staggers/2010/08/immigration-migrationwatch
http://www.guardian.co.uk/news/datablog/2010/aug/20/doctor-who-time-travel-information-is-beautiful

Doctor Who: Every single journey through time detailed detailed by Information is Beautiful. As a spreadsheet | Television & radio

Doctor Who time travels of the Doctor: Information is Beautiful gets the data - what can you do? Illustration: David McCandless for the Guardian Last year, I created a visualisation of Time travel in TV & Films .
A novel map of the internet created by Marián Boguñá and colleagues at the University of Barcelona, Spain, could help make network glitches a thing of the past. Boguñá squeezed the entire network into a disc using hyperbolic geometry, more familiar to us through the circular mosaic-like artworks of M. C. http://www.newscientist.com/article/dn19420-escherlike-internet-map-could-speed-online-traffic.html

Escher-like internet map could speed online traffic - tech - 08 September 2010

Research tips

Last Thurs­day (28 March 2013), George Box passed away at the age of 93. He was one of the great sta­tis­ti­cians of the last 100 years, and leaves an aston­ish­ingly diverse legacy. When I teach fore­cast­ing to my sec­ond year com­merce stu­dents, we cover Box-​​Cox trans­for­ma­tions, Box-​​Pierce and Ljung-​​Box tests, and Box-​​Jenkins mod­el­ling, and my stu­dents won­der if it is the same Box in all cases. It is. And we don’t even go near his work on response sur­face mod­el­ling, design of exper­i­ments, qual­ity con­trol or ran­dom num­ber gen­er­a­tion. http://robjhyndman.com/hyndsight/
information is beautiful

statistics blogs

http://flowingdata.com/2010/08/31/how-to-visualize-data-with-cartoonish-faces/

How to visualize data with cartoonish faces ala Chernoff

FlowingData reader Chris asks: I was wondering, have you ever considered doing a Chernoff faces tutorial for R? I think Chernoff faces are pretty interesting and I haven't seen much about them on the web. This wasn't the first time someone's asked how to make Chernoff faces, so I did a quick search. Guess what.
In probability theory, a stochastic system is one whose state is non- deterministic . The subsequent state of a stochastic system is determined both by the system's predictable actions and by a random element . A stochastic process is one whose behavior is non-deterministic; it can be thought of as a sequence of random variables . Any system or process that can be analyzed using probability theory is stochastic. [ 1 ] [ 2 ] Stochastic systems and processes play a fundamental role in mathematical models of phenomena in many fields of science, engineering, and economics . http://en.wikipedia.org/wiki/Stochastic

Stochastic

http://www.jeremymiles.co.uk/randomness/index.html

Randomness

There was a query on the SAS mailing list today - someone got inconsistent results for confidence intervals between Excel and SAS. In Excel, they were using the confidence() function, which I'd not come across before. And I'm glad about that. See, to calculate a confidence interval, you multiply the standard error of the distribution for the critical value from the t-distribution. You can find that value using (say) R, with the qt() function or Excel, with the tinv() function. The t-distribution approximates the normal distribution as the sample size increases - you need a sample size of infinity for them to be exactly the same, but if the same size is large enough, then it's close.

Skyrails Blog

Someone finally (thanks Christian) sent a mail in the skyrails-public mailing list, asking some questions, so I answered in the mailing list.