background preloader

Scientific method: Statistical errors

Scientific method: Statistical errors
For a brief moment in 2010, Matt Motyl was on the brink of scientific glory: he had discovered that extremists quite literally see the world in black and white. The results were “plain as day”, recalls Motyl, a psychology PhD student at the University of Virginia in Charlottesville. Data from a study of nearly 2,000 people seemed to show that political moderates saw shades of grey more accurately than did either left-wing or right-wing extremists. “The hypothesis was sexy,” he says, “and the data provided clear support.” The P value, a common index for the strength of evidence, was 0.01 — usually interpreted as 'very significant'. Publication in a high-impact journal seemed within Motyl's grasp. But then reality intervened. It turned out that the problem was not in the data or in Motyl's analyses. For many scientists, this is especially worrying in light of the reproducibility concerns. Out of context P values have always had critics. What does it all mean? Numbers game Related:  the existential quest of psychological science for its soulStatsScientific Theory and Praxes

The British amateur who debunked the mathematics of happiness | Science | The Observer Nick Brown does not look like your average student. He's 53 for a start and at 6ft 4in with a bushy moustache and an expression that jackknifes between sceptical and alarmed, he is reminiscent of a mid-period John Cleese. He can even sound a bit like the great comedian when he embarks on an extended sardonic riff, which he is prone to do if the subject rouses his intellectual suspicion. A couple of years ago that suspicion began to grow while he sat in a lecture at the University of East London, where he was taking a postgraduate course in applied positive psychology. There was a slide showing a butterfly graph – the branch of mathematical modelling most often associated with chaos theory. On the graph was a tipping point that claimed to identify the precise emotional co-ordinates that divide those people who "flourish" from those who "languish". According to the graph, it all came down to a specific ratio of positive emotions to negative emotions. It was as simple as that.

Still Not Significant What to do if your p-value is just over the arbitrary threshold for ‘significance’ of p=0.05? You don’t need to play the significance testing game – there are better methods, like quoting the effect size with a confidence interval – but if you do, the rules are simple: the result is either significant or it is not. So if your p-value remains stubbornly higher than 0.05, you should call it ‘non-significant’ and write it up as such. The problem for many authors is that this just isn’t the answer they were looking for: publishing so-called ‘negative results’ is harder than ‘positive results’. The solution is to apply the time-honoured tactic of circumlocution to disguise the non-significant result as something more interesting. As well as being statistically flawed (results are either significant or not and can’t be qualified), the wording is linguistically interesting, often describing an aspect of the result that just doesn’t exist.

Scientific Regress by William A. Wilson The problem with ­science is that so much of it simply isn’t. Last summer, the Open Science Collaboration announced that it had tried to replicate one hundred published psychology experiments sampled from three of the most prestigious journals in the field. Scientific claims rest on the idea that experiments repeated under nearly identical conditions ought to yield approximately the same results, but until very recently, very few had bothered to check in a systematic way whether this was actually the case. The OSC was the biggest attempt yet to check a field’s results, and the most shocking. In many cases, they had used original experimental materials, and sometimes even performed the experiments under the guidance of the original researchers. Of the studies that had originally reported positive results, an astonishing 65 percent failed to show statistical significance on replication, and many of the remainder showed greatly reduced effect sizes. What about accuracy? So the dogma goes.

The Fight Over Tesla Shows How Little Value Dealerships Add - Karan Girotra and Serguei Netessine by Karan Girotra and Serguei Netessine | 8:00 AM March 21, 2014 Last week New Jersey started enforcing a ban on direct sales by Tesla Motors of its path-breaking model S. Tesla’s direct sales have also run into hot water in a number of other states: Ohio lawmakers are debating a ban on Tesla’s direct sales and Texas, Arizona, and Virginia are also opposed. Proponents of a ban on direct sales claim that they are acting in the interest of customers. But is it the interests of customers they’re following or rather the bidding of the powerful car dealership lobby? Car dealers and more generally intermediaries represent an extra layer of companies in the supply chain that clearly increases costs to customers. Search and discovery: In the same way eBay helps turn one person’s junk into another person’s collectible or AirBnb makes your empty guest room a hotel room, intermediaries can help buyers find sellers.

How to Make a Concept Model I can draw. I went to art school. I studied painting until I fell out with the abstract expressionists and switched to photography. What I cannot do is diagram. Other than the oh-god-my-eyes color choices, my social architecture diagram has deeper problems. A concept model is a visual representation of a set of ideas that clarifies the concept for both the thinker and the audience. The best known concept model in the user experience profession is probably Jesse James Garrett’s “Elements of User Experience.” If you wish to clearly present a set of ideas to an audience and represent how they fit together, a diagram is much more powerful than words alone. “The more we draw, the more our ideas become visible, and as they become visible they become clear, and as they become clear they become easier to discuss—which in the virtuous cycle of visual thinking prompts us to discuss even more.” Concept models can serve many purposes. … or to help teammates understand how the site currently works…

Is social psychology really in crisis? The headlines Disputed results a fresh blow for social psychology Replication studies: Bad copy The story Controversy is simmering in the world of psychology research over claims that many famous effects reported in the literature aren’t reliable, or may even not exist at all. The latest headlines follow the publication of experiments which failed to replicate a landmark study by Dutch psychologist Ap Dijksterhuis. What they actually did The first of Dijksterhuis' original experiments asked people to think about the typical university professor and list on paper their appearance, lifestyle and behaviours. The experiment found that people who had thought about professors scored 10% higher than people who hadn’t been primed in this way. How plausible is it It’s extremely plausible that people are influenced by recent activities and thoughts - the concept of priming is beyond question, having been supported by decades of research. Tom’s take Read more Shanks, D. Rolf Zwaan on replication done right

Use standard deviation (not mad about MAD) Nassim Nicholas Taleb recently wrote an article advocating the abandonment of the use of standard deviation and advocating the use of mean absolute deviation. Mean absolute deviation is indeed an interesting and useful measure- but there is a reason that standard deviation is important even if you do not like it: it prefers models that get totals and averages correct. Absolute deviation measures do not prefer such models. So while MAD may be great for reporting, it can be a problem when used to optimize models. Let’s suppose we have 2 boxes of 10 lottery tickets: all tickets were purchased for $1 each for the same game in an identical fashion at the same time. Now since all tickets are identical if we are making a mere point-prediction (a single number value estimate for each ticket instead of a detailed posterior distribution) then there is an optimal prediction that is a single number V. Suppose we use mean absolute deviation as our measure of model quality. Be Sociable, Share!

theconversation Research and creative thinking can change the world. This means that academics have enormous power. But, as academics Asit Biswas and Julian Kirchherr have warned, the overwhelming majority are not shaping today’s public debates. Instead, their work is largely sitting in academic journals that are read almost exclusively by their peers. Up to 1.5 million peer-reviewed articles are published annually. This suggests that a lot of great thinking and many potentially world altering ideas are not getting into the public domain. The answer appears to be threefold: a narrow idea of what academics should or shouldn’t do; a lack of incentives from universities or governments; and a lack of training in the art of explaining complex concepts to a lay audience. The ‘intellectual mission’ Some academics insist that it’s not their job to write for the general public. The counter argument is that academics can’t operate in isolation from the world’s very real problems. No incentives Learning to write

Brains flush toxic waste in sleep, including Alzheimer’s-linked protein, study of mice finds Scientists say this nightly self-clean by the brain provides a compelling biological reason for the restorative power of sleep. “Sleep puts the brain in another state where we clean out all the byproducts of activity during the daytime,” said study author and University of Rochester neurosurgeon Maiken Nedergaard. Those byproducts include beta-amyloid protein, clumps of which form plaques found in the brains of Alzheimer’s patients. Staying up all night could prevent the brain from getting rid of these toxins as efficiently, and explain why sleep deprivation has such strong and immediate consequences. Although as essential and universal to the animal kingdom as air and water, sleep is a riddle that has baffled scientists and philosophers for centuries. One line of thinking was that sleep helps animals to conserve energy by forcing a period of rest. Another puzzle involves why different animals require different amounts of sleep per night.

9 questions about Syria you were too embarrassed to ask The United States and allies are preparing for a possibly imminent series of limited military strikes against Syria, the first direct U.S. intervention in the two-year civil war, in retaliation for President Bashar al-Assad's suspected use of chemical weapons against civilians. If you found the above sentence kind of confusing, or aren't exactly sure why Syria is fighting a civil war, or even where Syria is located, then this is the article for you. What's happening in Syria is really important, but it can also be confusing and difficult to follow even for those of us glued to it. Here, then, are the most basic answers to your most basic questions. First, a disclaimer: Syria and its history are really complicated; this is not an exhaustive or definitive account of that entire story, just some background, written so that anyone can understand it. Read award-winning novelist Teju Cole's funny and insightful parody of this article, "9 questions about Britain you were too embarrassed to ask

Data Colada | [19] Fake Data: Mendel vs. Stapel Diederik Stapel, Dirk Smeesters, and Lawrence Sanna published psychology papers with fake data. They each faked in their own idiosyncratic way, nevertheless, their data do share something in common. Real data are noisy. Theirs aren’t. Gregor Mendel’s data also lack noise (yes, famous peas-experimenter Mendel). Because Mendel, unlike the psychologists, had a motive. Excessive similarityTo get a sense for what we are talking about, let’s look at the study that first suggested Smeesters was a fabricateur. (See retracted paper .pdf) Results are as predicted. Stapel and Sanna had data with the same problem. How Mendel is like Stapel, Smeesters & SannaMendel famously crossed plants and observed the share of baby-plants with a given trait. Recall how Smeesters’ data had 27/100000 chance if data were real? How Mendel is not like Stapel, Smeesters, SannaMendel wanted his data to look like his theory. Imagine Mendel runs an experiment and gets 27% instead of 33% of baby-plants with a trait.

Taleb - Deviation The notion of standard deviation has confused hordes of scientists; it is time to retire it from common use and replace it with the more effective one of mean deviation. Standard deviation, STD, should be left to mathematicians, physicists and mathematical statisticians deriving limit theorems. There is no scientific reason to use it in statistical investigations in the age of the computer, as it does more harm than good—particularly with the growing class of people in social science mechanistically applying statistical tools to scientific problems. Say someone just asked you to measure the "average daily variations" for the temperature of your town (or for the stock price of a company, or the blood pressure of your uncle) over the past five days. The five changes are: (-23, 7, -3, 20, -1). Do you take every observation: square it, average the total, then take the square root? It all comes from bad terminology for something non-intuitive.

Related: