background preloader

Scientific method: Statistical errors

Scientific method: Statistical errors
For a brief moment in 2010, Matt Motyl was on the brink of scientific glory: he had discovered that extremists quite literally see the world in black and white. The results were “plain as day”, recalls Motyl, a psychology PhD student at the University of Virginia in Charlottesville. Data from a study of nearly 2,000 people seemed to show that political moderates saw shades of grey more accurately than did either left-wing or right-wing extremists. “The hypothesis was sexy,” he says, “and the data provided clear support.” The P value, a common index for the strength of evidence, was 0.01 — usually interpreted as 'very significant'. Publication in a high-impact journal seemed within Motyl's grasp. But then reality intervened. It turned out that the problem was not in the data or in Motyl's analyses. For many scientists, this is especially worrying in light of the reproducibility concerns. Out of context P values have always had critics. What does it all mean? Numbers game

Related:  the existential quest of psychological science for its soulStatsScientific Theory and Praxes

The British amateur who debunked the mathematics of happiness Nick Brown does not look like your average student. He's 53 for a start and at 6ft 4in with a bushy moustache and an expression that jackknifes between sceptical and alarmed, he is reminiscent of a mid-period John Cleese. He can even sound a bit like the great comedian when he embarks on an extended sardonic riff, which he is prone to do if the subject rouses his intellectual suspicion. A couple of years ago that suspicion began to grow while he sat in a lecture at the University of East London, where he was taking a postgraduate course in applied positive psychology. There was a slide showing a butterfly graph – the branch of mathematical modelling most often associated with chaos theory. On the graph was a tipping point that claimed to identify the precise emotional co-ordinates that divide those people who "flourish" from those who "languish".

Use standard deviation (not mad about MAD) Nassim Nicholas Taleb recently wrote an article advocating the abandonment of the use of standard deviation and advocating the use of mean absolute deviation. Mean absolute deviation is indeed an interesting and useful measure- but there is a reason that standard deviation is important even if you do not like it: it prefers models that get totals and averages correct. Absolute deviation measures do not prefer such models. So while MAD may be great for reporting, it can be a problem when used to optimize models. Scientific Regress by William A. Wilson The problem with ­science is that so much of it simply isn’t. Last summer, the Open Science Collaboration announced that it had tried to replicate one hundred published psychology experiments sampled from three of the most prestigious journals in the field. Scientific claims rest on the idea that experiments repeated under nearly identical conditions ought to yield approximately the same results, but until very recently, very few had bothered to check in a systematic way whether this was actually the case. The OSC was the biggest attempt yet to check a field’s results, and the most shocking. In many cases, they had used original experimental materials, and sometimes even performed the experiments under the guidance of the original researchers. Of the studies that had originally reported positive results, an astonishing 65 percent failed to show statistical significance on replication, and many of the remainder showed greatly reduced effect sizes.

Is social psychology really in crisis? The headlines Disputed results a fresh blow for social psychology Replication studies: Bad copy Taleb - Deviation The notion of standard deviation has confused hordes of scientists; it is time to retire it from common use and replace it with the more effective one of mean deviation. Standard deviation, STD, should be left to mathematicians, physicists and mathematical statisticians deriving limit theorems. There is no scientific reason to use it in statistical investigations in the age of the computer, as it does more harm than good—particularly with the growing class of people in social science mechanistically applying statistical tools to scientific problems. Say someone just asked you to measure the "average daily variations" for the temperature of your town (or for the stock price of a company, or the blood pressure of your uncle) over the past five days. The five changes are: (-23, 7, -3, 20, -1).

theconversation Research and creative thinking can change the world. This means that academics have enormous power. But, as academics Asit Biswas and Julian Kirchherr have warned, the overwhelming majority are not shaping today’s public debates. [19] Fake Data: Mendel vs. Stapel Diederik Stapel, Dirk Smeesters, and Lawrence Sanna published psychology papers with fake data. They each faked in their own idiosyncratic way, nevertheless, their data do share something in common. Real data are noisy. Theirs aren’t. Gregor Mendel’s data also lack noise (yes, famous peas-experimenter Mendel). The problem with p values: how significant are they, really? For researchers there’s a lot that turns on the p value, the number used to determine whether a result is statistically significant. The current consensus is that if p is less than .05, a study has reached the holy grail of being statistically significant, and therefore likely to be published. Over .05 and it’s usually back to the drawing board. But today, Texas A&M University professor Valen Johnson, writing in the prestigious journal Proceedings of the National Academy of Sciences, argues that p less than .05 is far too weak a standard. Using .05 is, he contends, a key reason why false claims are published and many published results fail to replicate. He advocates requiring .005 or even .001 as the criterion for statistical significance.

Karl Popper: What Makes a Theory Scientific It’s not immediately clear, to the layman, what the essential difference is between science and something masquerading as science: pseudoscience. The distinction gets at the core of what comprises human knowledge: How do we actually know something to be true? Is it simply because our powers of observation tell us so? Or is there more to it? [21] Fake-Data Colada Recently, a psychology paper (.pdf) was flagged as possibly fraudulent based on statistical analyses (.pdf). The author defended his paper (.html), but the university committee investigating misconduct concluded it had occurred (.pdf). In this post we present new and more intuitive versions of the analyses that flagged the paper as possibly fraudulent. We then rule out p-hacking among other benign explanation. Excessive linearityThe whistleblowing report pointed out the suspicious paper had excessively linear results.

A Taxonomy of Data Science Posted: September 25th, 2010 | Author: Hilary Mason | Filed under: Philosophy of Data | Tags: data, data science, osemn, taxonomy | 31 Comments Both within the academy and within tech startups, we’ve been hearing some similar questions lately: Where can I find a good data scientist? What do I need to learn to become a data scientist? Or more succinctly: What is data science?

The Amazing Significo: why researchers need to understand poker Suppose I tell you that I know of a magician, The Amazing Significo, with extraordinary powers. He can undertake to deal you a five-card poker hand which has three cards with the same number. You open a fresh pack of cards, shuffle the pack and watch him carefully. The Amazing Significo deals you five cards and you find that you do indeed have three of a kind. According to Wikipedia, the chance of this happening by chance when dealing from an unbiased deck of cards is around 2 per cent - so you are likely to be impressed. You may go public to endorse The Amazing Significo's claim to have supernatural abilities.