Scientific method: Statistical errors

For a brief moment in 2010, Matt Motyl was on the brink of scientific glory: he had discovered that extremists quite literally see the world in black and white. The results were “plain as day”, recalls Motyl, a psychology PhD student at the University of Virginia in Charlottesville. Data from a study of nearly 2,000 people seemed to show that political moderates saw shades of grey more accurately than did either left-wing or right-wing extremists. “The hypothesis was sexy,” he says, “and the data provided clear support.” The P value, a common index for the strength of evidence, was 0.01 — usually interpreted as 'very significant'. Publication in a high-impact journal seemed within Motyl's grasp. But then reality intervened. It turned out that the problem was not in the data or in Motyl's analyses. For many scientists, this is especially worrying in light of the reproducibility concerns. Out of context P values have always had critics. What does it all mean? Numbers game Related: Stats

Still Not Significant What to do if your p-value is just over the arbitrary threshold for ‘significance’ of p=0.05? You don’t need to play the significance testing game – there are better methods, like quoting the effect size with a confidence interval – but if you do, the rules are simple: the result is either significant or it is not. So if your p-value remains stubbornly higher than 0.05, you should call it ‘non-significant’ and write it up as such. The problem for many authors is that this just isn’t the answer they were looking for: publishing so-called ‘negative results’ is harder than ‘positive results’. The solution is to apply the time-honoured tactic of circumlocution to disguise the non-significant result as something more interesting. As well as being statistically flawed (results are either significant or not and can’t be qualified), the wording is linguistically interesting, often describing an aspect of the result that just doesn’t exist.

Use standard deviation (not mad about MAD) Nassim Nicholas Taleb recently wrote an article advocating the abandonment of the use of standard deviation and advocating the use of mean absolute deviation. Mean absolute deviation is indeed an interesting and useful measure- but there is a reason that standard deviation is important even if you do not like it: it prefers models that get totals and averages correct. Absolute deviation measures do not prefer such models. So while MAD may be great for reporting, it can be a problem when used to optimize models. Let’s suppose we have 2 boxes of 10 lottery tickets: all tickets were purchased for $1 each for the same game in an identical fashion at the same time. Now since all tickets are identical if we are making a mere point-prediction (a single number value estimate for each ticket instead of a detailed posterior distribution) then there is an optimal prediction that is a single number V. Suppose we use mean absolute deviation as our measure of model quality. Be Sociable, Share!

Taleb - Deviation The notion of standard deviation has confused hordes of scientists; it is time to retire it from common use and replace it with the more effective one of mean deviation. Standard deviation, STD, should be left to mathematicians, physicists and mathematical statisticians deriving limit theorems. There is no scientific reason to use it in statistical investigations in the age of the computer, as it does more harm than good—particularly with the growing class of people in social science mechanistically applying statistical tools to scientific problems. Say someone just asked you to measure the "average daily variations" for the temperature of your town (or for the stock price of a company, or the blood pressure of your uncle) over the past five days. The five changes are: (-23, 7, -3, 20, -1). Do you take every observation: square it, average the total, then take the square root? It all comes from bad terminology for something non-intuitive.

Instrumental Variables Jan 10, 2014 Instrumental variables are an incredibly powerful for dealing with unobserved heterogenity within the context of regression but the language used to define them is mind bending. Typically, you hear something along the lines of “an instrumental variable is a variable that is correlated with x but uncorrelated with the outcome except through x.” At this point examples are listed — taxes on smoking likely effect health only through their actions on smoking — or the author drops right into the math stats. I like math stats (when I am not getting a grade for it at least!) and will work through it. I turned to Google and did several searches and the only simple simulation that I could find was done using Stata. Overview Suppose that you have a continuous variable with the known mean response function and further that and are correlated with each other. but in this case where is white noise centered on zero. and where is some latent part of and is still unobserved. Simulations cor(x, c)

datasharing Weak statistical standards implicated in scientific irreproducibility The plague of non-reproducibility in science may be mostly due to scientists’ use of weak statistical tests, as shown by an innovative method developed by statistician Valen Johnson, at Texas A&M University in College Station. Johnson compared the strength of two types of tests: frequentist tests, which measure how unlikely a finding is to occur by chance, and Bayesian tests, which measure the likelihood that a particular hypothesis is correct given data collected in the study. The strength of the results given by these two types of tests had not been compared before, because they ask slightly different types of questions. So Johnson developed a method that makes the results given by the tests — the P value in the frequentist paradigm, and the Bayes factor in the Bayesian paradigm — directly comparable. Johnson then used these uniformly most powerful tests to compare P values to Bayes factors. Indeed, as many as 17–25% of such findings are probably false, Johnson calculates1.

Significantly misleading Author: Mark Kelly Mark Twain with characteristic panache said ‘…I am dead to adverbs, they cannot excite me’. Stephen King agrees saying ‘The road to hell is paved with adverbs’. The idea being of course that if you are using an adverb you have chosen the wrong verb. What are we to make then of the ubiquitous ‘statistically significantly related’. ‘Statistically significant’ is a tremendously ugly phrase but unfortunately that is the least of its shortcomings. Imagine if an environmentalist said that oil contamination was detectable in a sample of water from a protected coral reef. What we mean by a ‘statistically significant’ difference is that the difference is ‘unlikely to be zero’. Statistically discernible is still 50% adverb however.

mirador mirador Mirador is a tool for visual exploration of complex datasets. It enables users to discover correlation patterns and derive new hypotheses from the data. Download 1.3 (8 December 2014) Windows Mac OS X Instructions Download the file appropriate for your operating system. About Mirador is an open source project released under the GNU Public License v2. Further reading Ebola prognosis prediction—Computational methods for patient prognosis based on available clinical data—June 9th, 2015 Ebola data release—De-identified clinical data from Ebola patients treated at the Kenema Government Hospital in Sierra Leone between May and June of 2014—February 26th, 2015 Awards from the Department of Health and Human Services—Mirador received the third place, innovation and scientific excellence awards in the HHS VizRisk challenge—January 5th, 2015 Winning entries in the Mirador Data Competition—Read about the winning correlations submitted by Mirador users—December 1st, 2014 Citation

QQ Plots for NYs Ozone Pollution Data Introduction Continuing my recent series on exploratory data analysis, today’s post focuses on quantile-quantile (Q-Q) plots, which are very useful plots for assessing how closely a data set fits a particular distribution. I will discuss how Q-Q plots are constructed and use Q-Q plots to assess the distribution of the “Ozone” data from the built-in “airquality” data set in R. Previous posts in this series on EDA include Learn how to create a quantile-quantile plot like this one with R code in the rest of this blog! What is a Quantile-Quantile Plot? A quantile-quantile plot, or Q-Q plot, is a plot of the sorted quantiles of one data set against the sorted quantiles of another data set. The sample sizes of the 2 data sets do not have to be equal. The quantiles of the 2 data sets can be observed or theoretical. Constructing Quantile-Quantile Plots to Check Goodness of Fit The following steps will build a Q-Q plot to check how well a data set fits a particular theoretical distribution. References

Prob and Stats Cookbook The cookbook contains a succinct representation of various topics in probability theory and statistics. It provides a comprehensive reference reduced to the mathematical essence, rather than aiming for elaborate explanations. Download Last updated: January 24, 2014 Language: english The LaTeX source code is available on github and comes with a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Feel encouraged to extend the cookbook by forking it and submitting pull requests. To reproduce in a different context, please contact me. The cookbook aims to be language agnostic and factors out its textual elements into a separate dictionary. The current translation setup is heavily geared to Roman languages, as this was the easiest way to begin with. Here are the 3 most recent entries of the changelog file (for all versions of the cookbook): 2014-01-24 Matthias Vallentin <vallentin@icir.org> * Fix wrong denominator in alternative CLT representations.

Absolute Deviation Around the Median Median Absolute Deviation (MAD) or Absolute Deviation Around the Median as stated in the title, is a robust measure of central tendency. Robust statistics are statistics with good performance for data drawn from a wide range of non-normally distributed probability distributions. Unlike the standard mean/standard deviation combo, MAD is not sensitive to the presence of outliers. This robustness is well illustrated by the median’s breakdown point Donoho & Huber, 1983. The interquartile range is also resistant to the influence of outliers, although the mean and median absolute deviation are better in that they can be converted into values that approximate the standard deviation. Essentially the breakdown point for a parameter (median, mean, variance, etc.) is the proportion or number of arbitrarily small or large extreme values that must be introduced into a sample to cause the estimator to yield an arbitrarily bad result. Using the same set from earlier:

Chart of distribution relationships Probability distributions have a surprising number inter-connections. A dashed line in the chart below indicates an approximate (limit) relationship between two distribution families. A solid line indicates an exact relationship: special case, sum, or transformation. Click on a distribution for the parameterization of that distribution. Follow @ProbFact on Twitter to get one probability fact per day, such as the relationships on this diagram. More mathematical diagrams The chart above is adapted from the chart originally published by Lawrence Leemis in 1986 (Relationships Among Common Univariate Distributions, American Statistician 40:143-146.) Parameterizations The precise relationships between distributions depend on parameterization. Let C(n, k) denote the binomial coefficient(n, k) and B(a, b) = Γ(a) Γ(b) / Γ(a + b). Geometric: f(x) = p (1-p)x for non-negative integers x. Discrete uniform: f(x) = 1/n for x = 1, 2, ..., n. Poisson: f(x) = exp(-λ) λx/ x! Uniform: f(x) = 1 for 0 ≤ x ≤ 1. jdc