background preloader

Stats

Facebook Twitter

Handy statistical lexicon. These are all important methods and concepts related to statistics that are not as well known as they should be.

Handy statistical lexicon

I hope that by giving them names, we will make the ideas more accessible to people: Mister P: Multilevel regression and poststratification. The Secret Weapon: Fitting a statistical model repeatedly on several different datasets and then displaying all these estimates together. The Superplot: Line plot of estimates in an interaction, with circles showing group sizes and a line showing the regression of the aggregate averages. The Folk Theorem: When you have computational problems, often there’s a problem with your model. The Pinch-Hitter Syndrome: People whose job it is to do just one thing are not always so good at that one thing. Weakly Informative Priors: What you should be doing when you think you want to use noninformative priors. P-values and U-values: They’re different. Conservatism: In statistics, the desire to use methods that have been used before.

P.S. 40 Techniques Used by Data Scientists. These techniques cover most of what data scientists and related practitioners are using in their daily activities, whether they use solutions offered by a vendor, or whether they design proprietary tools.

40 Techniques Used by Data Scientists

When you click on any of the 40 links below, you will find a selection of articles related to the entry in question. Most of these articles are hard to find with a Google search, so in some ways this gives you access to the hidden literature on data science, machine learning, and statistical science. Asdfree.

R Stat

Learn Data Science: Tutorials, Online Courses, Books, & More. Toward sustainable insights, or why polygamy is bad for you. Toward sustainable insights, or why polygamy is bad for you Binning et al., CIDR 2017 Buckle up!

Toward sustainable insights, or why polygamy is bad for you

Today we’re going to be talking about statistics, p-values, and the multiple comparisons problem. Some good background resources here are: For my own benefit, I’ll try and explain what follows as simply as possible – I find it incredibly easy to make mistakes otherwise! Let’s start with a very quick recap of p-values. Anscombe's quartet. All four sets are identical when examined using simple summary statistics, but vary considerably when graphed Anscombe's quartet comprises four datasets that have nearly identical simple descriptive statistics, yet appear very different when graphed.

Anscombe's quartet

Each dataset consists of eleven (x,y) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analyzing it and the effect of outliers on statistical properties. He described the article as being intended to attack the impression among statisticians that "numerical calculations are exact, but graphs are rough. Summer Olympics: Legends - Microsoft Power BI Community.

Las consecuencias de no renovar el censo durante más de una década en Colombia. Free Statistics Book. Big Data, Data Mining, Predictive Analytics, Statistics, StatSoft Electronic Textbook. This free ebook has been provided as a public service since 1995.

Big Data, Data Mining, Predictive Analytics, Statistics, StatSoft Electronic Textbook

Statistics: Methods and Applications textbook offers training in the understanding and application of statistics and data mining. Mathematicians mapped out every “Game of Thrones” relationship to find the main character — Quartz. Fans of the Game of Thrones books and TV series have long quarreled over who the true hero of the story is.

Mathematicians mapped out every “Game of Thrones” relationship to find the main character — Quartz

Daenerys? Tyrion? Jon Snow? Hodor? Every time a character seems to be developing into a protagonist, he or she is brutally killed (video). But several main characters remain. Andrew J. The books and HBO fantasy series, with their massive cast of characters, various shifting allegiances, and intricate relationship dynamics, were a perfect fit to be studied mathematically. “This is a fanciful application of network science,” Beveridge told Quartz. The pair started by connecting characters every time they “interacted” in the third book of the series, A Storm of Swords. The resulting network structure (above) broke the characters into extremely accurate communities that show the geographical, familial, and even adversarial ties between them.

“We didn’t tell it what the communities were, the network actually tells you what the communities are,” Beveridge said. Math 101: A reading list for lifelong learners. Ready to level up your working knowledge of math?

Math 101: A reading list for lifelong learners

Here’s what to read now — and next. Math 101, with Jennifer Ouellette First, start with these 5 books… 1. Number: The Language of Science Tobias Dantzig Plume, 2007. A Lesson in the Errors of Statistical Thinking: Nate Silver on Trump. Alyssa frazee. Thu 02 January 2014 | -- (permalink) My sister is a senior undergraduate majoring in sociology.

alyssa frazee

She just landed an awesome analyst job for next semester and was told she'll be using some R in the course of her work. She asked me to show her the ropes during winter vacation, and of course I said yes! [1505.04714] Alan M. Turing: The Applications of Probability to Cryptography. Photomath. Mathpix! Introduction to Metadata. Metadata is organised information that describes, locates or otherwise makes it easier to retrieve information.

Introduction to Metadata

Top 50 Free Statistical Software. How I Acted Like A Pundit And Screwed Up On Donald Trump. Since Donald Trump effectively wrapped up the Republican nomination this month, I’ve seen a lot of critical self-assessments from empirically minded journalists — FiveThirtyEight included, twice over — about what they got wrong on Trump.

How I Acted Like A Pundit And Screwed Up On Donald Trump

This instinct to be accountable for one’s predictions is good since the conceit of “data journalism,” at least as I see it, is to apply the scientific method to the news. That means observing the world, formulating hypotheses about it, and making those hypotheses falsifiable. Best Data Science Online Courses. The number of online data science courses have exploded in recent years and there courses for any needs. Here is a extensive list of free and paid courses from Coursera, DataCamp, Dataquest, edX, Udacity, Udemy, and other major providers.

By Brendan Martin, (LearnDataSci). The following is a comprehensive list of Data Science courses and resources that explain or teach skills within Data Science, such as machine learning, data mining, analytics, cleaning, visualization, scraping, using APIs to make data products, artificial intelligence, and much more. Please excuse our appearance. We want to keep the list here for your reference while improving it live, so you may notice some sections here may be inconsistent. Also, we would like you to know that some of the links to courses here are affiliate links. Save 65% on almost any course! Coursera. Dr. Wolfgang Rolke Home Page. Professor, Department of Mathematical Sciences, University of Puerto Rico - Mayaguez Research My main research area is the statistical analysis of data from high energy physics experiments.

I am an associate member of the CMS collaboration, a high energy physics experiment at the Large Hadron Collider at CERN, Geneva, Switzerland Publications and Presentations For a list of my publications and to download some of the related routines, go here Apps and Online Tools I have written well over 30 apps in R shiny. Teaching Here are links to the web pages I use for my courses: Database System Rankings 2016. Introduction to Data Quality.

How many times have you heard managers and colleagues complain about the quality of the data in a particular report, system or database? People often describe poor quality data as unreliable or not trustworthy. Defining exactly what high or low quality data is, why it is a certain quality level and how to manage and improve it is often a trickier task. Data Science - Part II - Working with R & R Studio. The Season for Sharing Data: Working with the newly released Census 2010-2014 ACS 5 year data in R. On December 3, 2015 the U.S. Census Bureau released the 2010-2014 5 year ACS (American Community Survey) data. You can read all about it on the Census website. This fantastic five-year statistical database provides aggregate social and economic characteristics about American individuals and families down to the block group level.

A number of online tools provide access to the ACS 2010-2014 data using graphical user interfaces (GUIs). These include the Census American FactFinder tool or via Social Explorer. 20 Big Data Repositories You Should Check Out. RepASA. Deducer: A GUI for R - Deducer Manual. LeaRning Path on R - Step by Step Guide to Learn Data Science on R. One of the common problems people face in learning R is lack of a structured path. They don’t know, from where to start, how to proceed, which track to choose?

Though, there is an overload of good free resources available on the Internet, this could be overwhelming as well as confusing at the same time. To create this R learning path, Analytics Vidhya and DataCamp sat together and selected a comprehensive set of resources to help you learn R from scratch. This learning path is a great introduction for anyone new to data science or R, and if you are a more experienced R user you will be updated on some of the latest advancements. This will help you learn R quickly and efficiently. Step 0: Warming up Before starting your journey, the first question to answer is: Why use R?

R is a fast growing open source contestant to commercial software packages like SAS, STATA and SPSS. Watch this 90 seconds video from Revolution Analytics to get an idea of how useful R could be. (Need a GUI? Weka 3 - Data Mining with Open Source Machine Learning Software in Java. AFAMaC-Matemáticas. Mass Shooting Tracker. Gun Violence Archive. HyperStat Online: An Introductory Statistics Textbook and Discussion of whether most published research is false. Recommend HyperStat to your friends on Facebook Click here for more cartoons by Ben Shabad. Other Sources Stat Primer by Bud Gerstman of San Jose State University Statistical forecasting notes by Robert Nau of Duke University related: RegressIt Excel add-in by Robert Nau CADDIS Volume 4: Data Analysis (EPA) The little handbook of statistical practice by Gerard E.

Stat Trek Tutorial Statistics at square 1 by T. Concepts and applications of inferential statistics by Richard Lowry of Vassar College CAST by W. SticiGui by P.

Coursera- R Programming

Coursera-Data Toolbox. Griffith Feeney Consulting. Introduction to Data Science with R - O’Reilly Media. Jtleek/datasharing. DARPA - Open Catalog. XDATA is developing an open source software library for big data to help overcome the challenges of effectively scaling to modern data volume and characteristics. The program is developing the tools and techniques to process and analyze large sets of imperfect, incomplete data. StatsLife - Significance back issues. Spurious Correlations. ISLP — Poster Competition 2014-2015. National Statistical Service § Sample Size Calculator. Tiimgreen/github-cheat-sheet. Computing for Social Sciences, Winter 2015. STATISTICS.

Databases

Coding. Maps and Viz. Examples. Bibliography. Python.