background preloader

Statistics

Facebook Twitter

R

Wp. So You Think You Have a Power Law — Well Isn't That Special. Easy statistics for AdWords A/B testing, and hamsters. So you’ve got your AdWords test all set up: Will people go for the headline “Code Review Tools” or “Tools for Code Review?” Gee they’re both so exciting! Who could choose! I know, I know, settle down. This is how these things go. Anyway, the next day you have 32 clicks on variant A (“Code Review Tools”) and 19 clicks on B (“Tools for Code Review”). Is that conclusive? Has A won? The answer matters. Normally a formal statistical treatment would be too difficult, but I’m here to rescue you with a statistically sound yet incredibly simple formula that will tell you whether or not your A/B test results really are indicating a difference. I’ll get to it in a minute, but I can’t help but include a more entertaining example than AdWords. In the movie, Hammy chooses the organic produce 8 times and the conventional 4 times.

If you’re like me, you probably think “organic” is the clear-cut winner — after all Hammy chose it twice as often as conventional veggies. Okay okay, we suck at math. P.S. Nassim Nicholas Taleb Home and Professiona. ScholarlyCommons - Shaun Lysen: Permuted Inclusion Criterion: A. Abstract We introduce a new variable selection technique called the Permuted Inclusion Criterion (PIC) based on augmenting the predictor space X with a row-permuted version denoted Xpi. We adopt the linear regression setup with n observations on p variables. Thus, our augmented space has p real predictors and p permuted predictors. This has many desirable properties for variable selection.

Recommended Citation Lysen, Shaun, "Permuted Inclusion Criterion: A Variable Selection Technique" (2009). Good Data and Flawed Conclusions. Jonathan Baron's R help page. Wikimania2007-SethAnthony.pdf. Blog - Nick Jenkins » Blog Archive » Wikimania Talk notes: “Wher. I’ll copy and paste my notes for some of the talks at Wikimania 2007 here, in case it’s helpful so that everyone can follow what’s going on. As such they will be in point / summary form, rather than well-formed prose: Talk:”Where have all the editors gone?” By Seth Anthony. Background in chemistry education. Who adds real content to the Wikipedia? Not all edits are created equal. Made a study using a sample of edits to the Wikipedia. Facts & figures on findings: 28% of edits: outside article namespace.10% article talk pages.62% article namespace. (i.e. 1/3 of the edits are not about the articles) Breakdown of edits: 5% vandalism45% of edits are tweaking / minor changes / adding categories.12% content creation.

So only 12% of edits create fresh content. Of these 12%, was most interested in this, so broke this down: 0% were made by admins69% were registered users.31% were created by anon users, or non-logged in users. … and only 52% were by people who had a user page. Admins Content creators. When an experimental study states "The group with treatment X had significantly less disease (p = 1%)", many people interpret this statement as being equivalent to "there is a 99% chance that if I do treatment X it will prevent disease.

" This essay explains why these statements are not equivalent. For such an experiment, all of the following are possible: X is in fact an effective treatment as claimed. X is only effective for some people, but not for me, because I am different in a way that the experiment failed to distinguish. X is ineffective, and only looked effective due to random chance. X is ineffective because of a systematic flaw in the experiment. Warning Sign D1: Lack of a Randomized Controlled Trial The most reliable experiment to evaluate a medical treatment is a randomized controlled trial, in which a population is randomly divided into a test group, which receives the treatment, and a control group, which does not. Why are controls important? Alf Landon Carl Sagan R.A.

Three-Toed Sloth. Simulation V: Matching Simulation Models to Data (Introduction to Statistical Computing) \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \DeclareMathOperator*{\argmin}{argmin} (My notes for this lecture are too incomplete to be worth typing up, so here's the sketch.) Methods, Models, Simulations Statistical methods try to draw inferences about unknown realities from data. They always involve some more or less tightly specified stochastic model, some story (myth) about how the data were generated, and so how the data are linked to the world. We want to know how well our methods work, which is usually hard to assess by just running them on real data.

Sometimes, we can push through analytic probability calculations under our model assumptions, but often not. Adjusting Simulation Models to Match Data Many (most?) Method of Moments and Friends Moments (\Expect{X}, \Expect{X^2}, \ldots ) are functional of the probability distribution. So far I've said nothing about simulation. Readings. WikiXRay. The main goal of this project is to develop a robust and extensible software tool for an in-depth quantitative analysis of the whole Wikipedia project. This project is currently developed by José Felipe Ortega (GlimmerPhoenix) at the Libresoft Group, at Universidad Rey Juan Carlos.

Downloading the 7zip database dump of the target language version.Construction and decompression of the database dump in a local storage media.Creating additional database tables with useful statistics and quantitative information.Generating graphics and data files with quantitative results, adequately organized in a per-language directory substructure. Some of these capabilities still require manual insertion of parameters in a common configuration file, though also some of them work automatically. The source code is publicly available under the GNU GPL license, and could be found in LibreSoft tools git repository, or alternatively on the WikiXRay project at Gitorious. Python Parser[edit] External links[edit] Mediation (David A. Kenny) Some might benefit from Muthén (2011). Note that both the CDE and the NDE would equal the regression slope or what was earlier called path c' if the model is linear, assumptions are met, and there is no XM interaction affecting Y, the NIE would equal ab, and the TE would equal ab + c'.

In the case in which the specifications made by traditional mediation approach (e.g., linearity, no omitted variables, no XM interaction), the estimates would be the same. Here I give the general formulas for the NDE and NIE when X is an intervally measured based on Valeri & VanderWeele, (2013). If the XM effect is added to the Y equation, that equation can be stated as and the intercept in the M equation can be denoted as iM. Where X0 is a theoretical baseline score on X or a "zero" score and X1 is a theoretical "improvement" score on X or "1" score.

When X is a dichotomy, it is fairly obvious what values to use for X0 and X1. References Baron, R. Bauer, D. Bolger, N., & Laurenceau, J. Bollen, K. Cole, D. Brian.Shaler.name | Blog. December 31, 2006 | Tags: digg, diggstatus, statistics, data, analysisThe Experiment Saturday, December 9th, I decided to run an experiment. The experiment was intended do several things: It needed to chronicle the Digg Effect. This has been done many times before, so I needed to come up with something that would provide more information than a typical traffic chart.I wanted to know more about the Digg community, and how most people use the site.

There has been plenty of coverage of the Top Users, but nothing that really shows their stats in the context of the entire Digg user base.It needed to determine profitability of "blog spamming" by tracking the ad revenue of one Google AdSense advertisement while being linked to on the Digg.com front page. We have all seen people post a summary of a news story on their ad-invested blog and post the link to Digg. First, I had to build an web application that would scrape and cache user statistics from Digg. Margin of Error and Confidence Levels Made Simple. Pamela Hunter February 26, 2010 A survey is a valuable assessment tool in which a sample is selected and information from the sample can then be generalized to a larger population. Surveying has been likened to taste-testing soup – a few spoonfuls tell what the whole pot tastes like. The key to the validity of any survey is randomness. Just as the soup must be stirred in order for the few spoonfuls to represent the whole pot, when sampling a population, the group must be stirred before respondents are selected.

It is critical that respondents be chosen randomly so that the survey results can be generalized to the whole population. How well the sample represents the population is gauged by two important statistics – the survey’s margin of error and confidence level. In other words, Company X surveys customers and finds that 50 percent of the respondents say its customer service is “very good.” Sample Size and the Margin of Error Calculating Margin of Error for Individual Questions. A Bayesian Truth Serum for Subjective Data -- Prelec 306 (5695): Why Kendall Tau. The Statistics Homepage. "Thank you and thank you again for providing a complete, well-structured, and easy-to-understand online resource. Every other website or snobbish research paper has not deigned to explain things in words consisting of less than four syllables. I was tossed to and fro like a man holding on to a frail plank that he calls his determination until I came across your electronic textbook...You have cleared the air for me.

You have enlightened. You have illuminated. You have educated me. " — Mr. "As a professional medical statistician of some 40 years standing, I can unreservedly recommend this textbook as a resource for self-education, teaching and on-the-fly illustration of specific statistical methodology in one-to-one statistical consulting. . — Mr. "Excellent book. . — Dr. "Just wanted to congratulate whoever wrote the 'Experimental Design' page. . — James A. Read More Testimonials >> StatSoft has freely provided the Electronic Statistics Textbook as a public service since 1995. Proper citation: Name Statistics - How popular are your fir. Clustering - Mixture of Gaussian.

The Internet Glossary of Statistical Term. Free Statistics - Free Statistica. Introduction to the Scientific Method. Introduction to the Scientific Method The scientific method is the process by which scientists, collectively and over time, endeavor to construct an accurate (that is, reliable, consistent and non-arbitrary) representation of the world. Recognizing that personal and cultural beliefs influence both our perceptions and our interpretations of natural phenomena, we aim through the use of standard procedures and criteria to minimize those influences when developing a theory. As a famous scientist once said, "Smart people (like smart lawyers) can come up with very good explanations for mistaken points of view. " In summary, the scientific method attempts to minimize the influence of bias or prejudice in the experimenter when testing an hypothesis or a theory.

I. 1. 2. 3. 4. If the experiments bear out the hypothesis it may come to be regarded as a theory or law of nature (more on the concepts of hypothesis, model, theory and law below). II. Error in experiments have several sources. III. IV. V. Correlation. Time Series Analysi. Cnn.com/SPECIALS/2005/online.evolution/int...