An intro to power and sample size estimation -- Jones et al. 20 (5): 453 -- Emergency Medicine Journal + Author Affiliations Correspondence to: Dr S R Jones, Emergency Department, Manchester Royal Infirmary, Oxford Road, Manchester M13 9WL, UK; steve.r.jones@bigfoot.com Abstract The importance of power and sample size estimation for study design and analysis. Understand power and sample size estimation. Power and sample size estimations are measures of how many patients are needed in a study. In previous articles in the series on statistics published in this journal, statistical inference has been used to determine if the results found are true or possibly due to chance alone. Power and sample size estimations are used by researchers to determine how many subjects are needed to answer the research question (or null hypothesis). An example is the case of thrombolysis in acute myocardial infarction (AMI). Generally these trials compared thrombolysis with placebo and often had a primary outcome measure of mortality at a certain number of days. Figure 1 Figure 2 Table 2 From Egbert’s scribblings:

Handy statistical lexicon These are all important methods and concepts related to statistics that are not as well known as they should be. I hope that by giving them names, we will make the ideas more accessible to people: Mister P: Multilevel regression and poststratification. The Secret Weapon: Fitting a statistical model repeatedly on several different datasets and then displaying all these estimates together. The Superplot: Line plot of estimates in an interaction, with circles showing group sizes and a line showing the regression of the aggregate averages. The Folk Theorem: When you have computational problems, often there’s a problem with your model. The Pinch-Hitter Syndrome: People whose job it is to do just one thing are not always so good at that one thing. Weakly Informative Priors: What you should be doing when you think you want to use noninformative priors. P-values and U-values: They’re different. Conservatism: In statistics, the desire to use methods that have been used before. P.S.

Data Marketplace It takes a lot of data of all kinds to add up to Big Data. That's why we've assembled this awesome collection of datasets and streams for data scientists and developers to sample, experiment with and use to create awesome analytics and applications. Whether you need to tap into social, geo or other kinds of data we've got just what you need. agebirthscensuscharacterchemistrycommoditiescorporadeathdemographicsdemographicseconomicsemploymentfootballgeonamesgovernmenthealthhealthhousingincomelanguagelanguagelawliteraturelocationslongitudemapsmusicnationalpollutionpopulationsciencesciencesize-largesocialsocialspendingsportsstatisticssurveytwitterwordzipcode

Video: Survey Package in R Sebastián Duchêne presented a talk at Melbourne R Users on 20th February 2013 on the Survey Package in R. Talk Overview: Complex designs are common in survey data. In practice, collecting random samples from a populations is costly and impractical. About the presenter: Sebastián Duchêne is a Ph.D. candidate at The University of Sydney, based at the Molecular Phylogenetics, Ecology, and Evolution Lab. See here for the full list of Melbourne R User Videos. Toward sustainable insights, or why polygamy is bad for you | the morning paper Toward sustainable insights, or why polygamy is bad for you Binning et al., CIDR 2017 Buckle up! Today we’re going to be talking about statistics, p-values, and the multiple comparisons problem. For my own benefit, I’ll try and explain what follows as simply as possible – I find it incredibly easy to make mistakes otherwise! p-values If we observe some variable and see value , we might wonder “what are the odds of that!” we’d be able to give an answer. about the underlying distribution. will be given that hypothesis, or : . Time to move on from dice rolls. we observe is now a measure of correlation between two measured phenomena. exactly equal to some value we need to ask ‘what are the odds of seeing a value (or )?’ . Suppose we see a suspiciously large value. p-value = (source: wikipedia) Here’s the first thinking trap. An arbitrary but universally accepted p-value of 0.05 (there’s a 5% chance of this observation given the hypothesis) is deemed as the threshold for ‘statistical significance.’ .

Gapminder: Unveiling the beauty of statistics for a fact based world view. Drinking, sex, eating: Why don't we tell the truth in surveys? 27 February 2013Last updated at 13:56 GMT By Brian Wheeler BBC News Magazine Many people are under-reporting how much alcohol they are drinking. But what else are we fibbing to researchers about and why do we do it? "I have the occasional sweet sherry. It is a classic British sitcom scene. But the tendency to paint a less-than-honest picture about your unhealthy habits and lifestyle is not just restricted to alcohol. It is understandable that people want to present a positive image of themselves to friends, family and colleagues. After all, the man or woman from the Office for National Statistics or Ipsos Mori can't order you to go on a diet or lay off the wine. It is a question that has been puzzling social scientists for decades. They even have a name for it - The Social Desirability Bias. "People respond to surveys in the way they think they ought to. The recycling never lies It is a particular problem when it comes to "sins" such as alcohol and food. Continue reading the main story

Majority to minority: the declining U.S. white population]Quand la majorité devient minorité : le cas des blancs aux Etats-Unis | N-IUSSP In this essay we document the demography of the decline of the white population in the United States, a country with a long history of white supremacy. Despite the fact that the U.S. Constitution and the civil rights legislation of the 1960s guaranteed equality to all people irrespective of race or ethnicity, everyone is far from equal in the United States today. On average, whites are far better off economically and educationally and in many other ways than are minority peoples. Levels of residential segregation by race and ethnicity are still nearly as high as they were decades ago. Donald Trump won the U.S. presidential election by focusing his attention on white people. Let us first note that U.S. federal government agencies use two questions to measure race/ethnicity. First immigrants to the U.S. Whites were not the first people to settle in what is now the United States. The first sizeable stream of immigrants to the U.S. were whites from England. Whites and minorities in the U.S.

What is a large enough random sample? With the well deserved popularity of A/B testing computer scientists are finally becoming practicing statisticians. One part of experiment design that has always been particularly hard to teach is how to pick the size of your sample. The two points that are hard to communicate are that: The required sample size is essentially independent of the total population size.The required sample size depends strongly on the strength of the effect you are trying to measure. These things are only hard to explain because the literature is overly technical (too many buzzwords and too many irrelevant concerns) and these misapprehensions can’t be relieved unless you spend some time addressing the legitimate underlying concerns they are standing in for. As usual explanation requires common ground (moving to shared assumptions) not mere technical bullying. We will try to work through these assumptions and then discuss proper sample size. The problem of population size. The problem of effect strength.

Beware of Zombie Statistics … Even When It’s Not Halloween (October 2017) Do women really own less than 2 percent of the world’s land? Do women constitute 70 percent of the world’s poor? Do women provide between 60 percent and 80 percent of the agricultural labor in Africa? Do widely cited statistics like these mean they are backed up with solid research? No, but they are repeated often enough that they have attained the status of official fact. Zombie statistics actually can be their own worst enemy. In her blog, Doss assessed the credibility of the claim that women own less than 2 percent of the world’s land. Women’s land rights are important, Doss says, but flawed data won’t resolve the issue. Another widely cited statistical zombie is that African women supply 60 percent to 80 percent of agricultural labor on the continent. A different zombie awoke when Carly Fiorina, a few months before she entered the GOP presidential campaign, said that “70 percent of the people living in abject poverty are women.”

Common Approaches for Analyzing Likert Scales and Other Categorical Data Analyzing Likert scale responses really comes down to what you want to accomplish (e.g. Are you trying to provide a formal report with probabilities or are you trying to simply understand the data better). Sometimes a couple of graphs are sufficient and a formalize statistical test isn’t even necessary. However, with how easy it is to conduct some of these statistical tests it is best to just formalize the analysis. The code to set up the data for some testing is as follows. 01.set.seed(1234) 02.library(e1071) 03.probs < - cbind(c(.4,.2/3,.2/3,.2/3,.4),c(.1/4,.1/4,.9,.1/4,.1/4),c(.2,.2,.2,.2,.2)) 04.my.n <- 100 05.my.len <- ncol(probs)*my.n 06.raw <- matrix(NA,nrow=my.len,ncol=2) 07.raw <- NULL 08.for(i in 1:ncol(probs)){ 09.raw <- rbind(raw, cbind(i,rdiscrete(my.n,probs=probs[,i],values=1:5))) 11.raw <- data.frame(raw) 12.names(raw) <- c("group","value") 13.raw$group <- as.factor(raw$group) 14.raw.1.2 <- subset(raw, raw$group %in% c(1,2)) I might as well get this one out of the way. 06.replicates)

Las consecuencias de no renovar el censo durante más de una década en Colombia Una persona con la piel oscura puede ser un negro, un afrocolombiano, un afrodescendiente, un libre, un renaciente, un palenquero, un moreno, un raizal o formar parte de la costeñidad en Colombia. La herencia africana y su posterior mestizaje se entienden de tantas maneras como sensibilidades se presentan, aunque sobre el papel sea difícil de explicar. La última vez que se contó a los colombianos fue en el censo de 2005 elaborado por el DANE (Departamento Administrativo Nacional de Estadística). En ese momento se dibujó un mapa en el que la población afro era algo más del 10% de los 41 millones de habitantes que se registraron. Una década después, las proyecciones superan los 48 y estos pueblos representan entre el 18% y el 20%, según datos de instituciones paralelas como la Universidad del Valle en Cali. Casi 10 puntos de diferencia. “¿Si no se sabe cuántos somos cómo se van a aplicar políticas públicas, cómo podemos reclamar nuestros derechos?”

Oh Ordinal data, what do we do with you? <a href=" Our Poll</a> What can you do with ordinal data? Or more to the point, what shouldn’t you do with ordinal data? First of all, let’s look at what ordinal data is. It is usual in statistics and other sciences to classify types of data in a number of ways. Nominal is pretty straight-forward. Ordinal data But then we come to ordinal level of measurement. A postgraduate degree is higher thana Bachelor’s degree,which is higher thana high-school qualification, which is higherthan no qualification. There are four steps on the scale, and it is clear that there is a logical sense of order. Another example of ordinal level of measurement is used extensively in psychological, educational and marketing research, known as a Likert scale. The question at the start of this post has an ordinal response, which could be perceived as indicating how quantitative the respondent believes ordinal data to be. Well! Here’s what I think: All ordinal data is not the same.