An intro to power and sample size estimation -- Jones et al. 20 (5): 453 -- Emergency Medicine Journal + Author Affiliations Correspondence to: Dr S R Jones, Emergency Department, Manchester Royal Infirmary, Oxford Road, Manchester M13 9WL, UK; steve.r.jones@bigfoot.com Abstract The importance of power and sample size estimation for study design and analysis. Understand power and sample size estimation. Power and sample size estimations are measures of how many patients are needed in a study. In previous articles in the series on statistics published in this journal, statistical inference has been used to determine if the results found are true or possibly due to chance alone. Power and sample size estimations are used by researchers to determine how many subjects are needed to answer the research question (or null hypothesis). An example is the case of thrombolysis in acute myocardial infarction (AMI). Generally these trials compared thrombolysis with placebo and often had a primary outcome measure of mortality at a certain number of days. Figure 1 Figure 2 Table 2 From Egbert’s scribblings:

40 Techniques Used by Data Scientists These techniques cover most of what data scientists and related practitioners are using in their daily activities, whether they use solutions offered by a vendor, or whether they design proprietary tools. When you click on any of the 40 links below, you will find a selection of articles related to the entry in question. Most of these articles are hard to find with a Google search, so in some ways this gives you access to the hidden literature on data science, machine learning, and statistical science. Starred techniques (marked with a *) belong to what I call deep data science, a branch of data science that has little if any overlap with closely related fields such as machine learning, computer science, operations research, mathematics, or statistics. To learn more about deep data science, click here. Also, to discover in which contexts and applications the 40 techniques below are used, I invite you to read the following articles: The 40 data science techniques DSC Resources

SQLZOO Learn SQL using: SQL Server, Oracle, MySQL, DB2, and PostgreSQL. Reference: how to... How to read the data from a database. 2 CREATE and DROP How to create tables, indexes, views and other things. How to get rid of them. 3 INSERT and DELETE How to put records into a table, change them and how to take them out again. 4 DATE and TIME How to work with dates; adding, subtracting and formatting. 5 Functions How to use string functions, logical functions and mathematical functions. 6 Users How to create users, GRANT and DENY access, get at other peoples tables. 7 Meta Data How to find out what tables and columns exist. 8 SQL Hacks Some SQL Hacks, taken from "SQL Hacks" published by O'Reilly 9 Using SQL with PHP on Amazon EC2 servers Video tutorials showing how to run MySQL, PHP and Apache on Amazon's EC2 cloud servers. 10 An introduction to transactions Video tutorials showing how sessions can interfere with each other and how to stop it. 11 Using SQL with C# in Visual Studio

Video: Survey Package in R Sebastián Duchêne presented a talk at Melbourne R Users on 20th February 2013 on the Survey Package in R. Talk Overview: Complex designs are common in survey data. In practice, collecting random samples from a populations is costly and impractical. About the presenter: Sebastián Duchêne is a Ph.D. candidate at The University of Sydney, based at the Molecular Phylogenetics, Ecology, and Evolution Lab. See here for the full list of Melbourne R User Videos. Handy statistical lexicon These are all important methods and concepts related to statistics that are not as well known as they should be. I hope that by giving them names, we will make the ideas more accessible to people: Mister P: Multilevel regression and poststratification. The Secret Weapon: Fitting a statistical model repeatedly on several different datasets and then displaying all these estimates together. The Superplot: Line plot of estimates in an interaction, with circles showing group sizes and a line showing the regression of the aggregate averages. The Folk Theorem: When you have computational problems, often there’s a problem with your model. The Pinch-Hitter Syndrome: People whose job it is to do just one thing are not always so good at that one thing. Weakly Informative Priors: What you should be doing when you think you want to use noninformative priors. P-values and U-values: They’re different. Conservatism: In statistics, the desire to use methods that have been used before. P.S.

www.iki.fi/sol - Tutorials - GalaXQL Who said SQL tutorials have to be boring? Try out GalaXQL 3.0 beta! Runs on your browser. Note: heavy javascript and webgl. Quotes / Testimonials "Incidentally, we've trained several students to be web developers using only your tutorial for SQL instruction--great work!" -- Dr Christopher Pound, Rice University "I have been looking for a good way to show SQL to analysts who need to learn it and this by far the best tool I have ever come across." -- Julie LeMay, DELL "Noodling with GalaXQL is the most fun database tutorial I've ever seen. -- Joey deVilla at Tucows' "the farm" "Much more entertaining and freeform than ordinary attempts at tutorials, certainly exciting!" -- Thomas Van Der Pol "I've just completed your tutorial and was very impressed! -- Stephed Bridges "[GalaXQL] rocks! -- Reuben Grinberg in his blog GalaXQL is an interactive SQL tutorial. GalaXQL 1.0 Virtual teacher (win32) GalaXQL 1.0 Virtual teacher (mac os x) Follow the instructions by your virtual teacher. Somewhat altered galaxy

Drinking, sex, eating: Why don't we tell the truth in surveys? 27 February 2013Last updated at 13:56 GMT By Brian Wheeler BBC News Magazine Many people are under-reporting how much alcohol they are drinking. But what else are we fibbing to researchers about and why do we do it? "I have the occasional sweet sherry. It is a classic British sitcom scene. But the tendency to paint a less-than-honest picture about your unhealthy habits and lifestyle is not just restricted to alcohol. It is understandable that people want to present a positive image of themselves to friends, family and colleagues. After all, the man or woman from the Office for National Statistics or Ipsos Mori can't order you to go on a diet or lay off the wine. It is a question that has been puzzling social scientists for decades. They even have a name for it - The Social Desirability Bias. "People respond to surveys in the way they think they ought to. The recycling never lies It is a particular problem when it comes to "sins" such as alcohol and food. Continue reading the main story

Toward sustainable insights, or why polygamy is bad for you | the morning paper Toward sustainable insights, or why polygamy is bad for you Binning et al., CIDR 2017 Buckle up! Today we’re going to be talking about statistics, p-values, and the multiple comparisons problem. For my own benefit, I’ll try and explain what follows as simply as possible – I find it incredibly easy to make mistakes otherwise! p-values If we observe some variable and see value , we might wonder “what are the odds of that!” we’d be able to give an answer. about the underlying distribution. will be given that hypothesis, or : . Time to move on from dice rolls. we observe is now a measure of correlation between two measured phenomena. exactly equal to some value we need to ask ‘what are the odds of seeing a value (or )?’ . Suppose we see a suspiciously large value. p-value = (source: wikipedia) Here’s the first thinking trap. An arbitrary but universally accepted p-value of 0.05 (there’s a 5% chance of this observation given the hypothesis) is deemed as the threshold for ‘statistical significance.’ .

Python Programming Language – Official Website What is a large enough random sample? With the well deserved popularity of A/B testing computer scientists are finally becoming practicing statisticians. One part of experiment design that has always been particularly hard to teach is how to pick the size of your sample. The two points that are hard to communicate are that: The required sample size is essentially independent of the total population size.The required sample size depends strongly on the strength of the effect you are trying to measure. These things are only hard to explain because the literature is overly technical (too many buzzwords and too many irrelevant concerns) and these misapprehensions can’t be relieved unless you spend some time addressing the legitimate underlying concerns they are standing in for. As usual explanation requires common ground (moving to shared assumptions) not mere technical bullying. We will try to work through these assumptions and then discuss proper sample size. The problem of population size. The problem of effect strength.

Majority to minority: the declining U.S. white population]Quand la majorité devient minorité : le cas des blancs aux Etats-Unis | N-IUSSP In this essay we document the demography of the decline of the white population in the United States, a country with a long history of white supremacy. Despite the fact that the U.S. Constitution and the civil rights legislation of the 1960s guaranteed equality to all people irrespective of race or ethnicity, everyone is far from equal in the United States today. On average, whites are far better off economically and educationally and in many other ways than are minority peoples. Levels of residential segregation by race and ethnicity are still nearly as high as they were decades ago. Donald Trump won the U.S. presidential election by focusing his attention on white people. Let us first note that U.S. federal government agencies use two questions to measure race/ethnicity. First immigrants to the U.S. Whites were not the first people to settle in what is now the United States. The first sizeable stream of immigrants to the U.S. were whites from England. Whites and minorities in the U.S.

Common Approaches for Analyzing Likert Scales and Other Categorical Data Analyzing Likert scale responses really comes down to what you want to accomplish (e.g. Are you trying to provide a formal report with probabilities or are you trying to simply understand the data better). Sometimes a couple of graphs are sufficient and a formalize statistical test isn’t even necessary. However, with how easy it is to conduct some of these statistical tests it is best to just formalize the analysis. The code to set up the data for some testing is as follows. 01.set.seed(1234) 02.library(e1071) 03.probs < - cbind(c(.4,.2/3,.2/3,.2/3,.4),c(.1/4,.1/4,.9,.1/4,.1/4),c(.2,.2,.2,.2,.2)) 04.my.n <- 100 05.my.len <- ncol(probs)*my.n 06.raw <- matrix(NA,nrow=my.len,ncol=2) 07.raw <- NULL 08.for(i in 1:ncol(probs)){ 09.raw <- rbind(raw, cbind(i,rdiscrete(my.n,probs=probs[,i],values=1:5))) 11.raw <- data.frame(raw) 12.names(raw) <- c("group","value") 13.raw$group <- as.factor(raw$group) 14.raw.1.2 <- subset(raw, raw$group %in% c(1,2)) I might as well get this one out of the way. 06.replicates)

Beware of Zombie Statistics … Even When It’s Not Halloween (October 2017) Do women really own less than 2 percent of the world’s land? Do women constitute 70 percent of the world’s poor? Do women provide between 60 percent and 80 percent of the agricultural labor in Africa? Do widely cited statistics like these mean they are backed up with solid research? No, but they are repeated often enough that they have attained the status of official fact. Zombie statistics actually can be their own worst enemy. In her blog, Doss assessed the credibility of the claim that women own less than 2 percent of the world’s land. Women’s land rights are important, Doss says, but flawed data won’t resolve the issue. Another widely cited statistical zombie is that African women supply 60 percent to 80 percent of agricultural labor on the continent. A different zombie awoke when Carly Fiorina, a few months before she entered the GOP presidential campaign, said that “70 percent of the people living in abject poverty are women.”