background preloader

Stat Articles/Blogs

Facebook Twitter

Data Science Blogs. 100 Savvy Sites on Statistics and Quantitative Analysis. Nate Silver’s unprecedented accurate prediction of state-by-state election results in the most recent presidential race was a watershed moment for the public awareness of statistics.

100 Savvy Sites on Statistics and Quantitative Analysis

While data gathering and analysis has become a massive industry in the past decade, it hasn’t always been as well covered in the press or publicly accessible as it is now. With more and more of our daily interactions being mediated through computers and the internet, it is easier than ever to gather detailed quantitative data and do statistical analysis on that data derive valuable information and predictions from it.

Knowledge of statistics and quantitative analysis techniques is more valuable than ever. From biostatisticians to politicians and economists, people in every field are using statistics to further their careers and knowledge. These sites are some of the most useful, informative, and comprehensive on the web covering stats and quantitative analysis. My advise on what you need to do to become a data scientist. Good Problems To Solve In Data Science. Thoughts on statistical consulting. The Statistics Department at UW-Madison has a course on statistical consulting, offered each semester.

Thoughts on statistical consulting

I’m often asked to give a lecture, which I do in an informal way: summarizing my experiences and answering questions. I thought it might be useful write my thoughts on statistical consulting here: why, how, and difficulties. Interview with Nick Chamandy, statistician at Google. Nick Chamandy Nick Chamandy received his M.S. in statistics from the University of Chicago, his Ph.D. in statistics at McGill University and joined Google as a statistician.

Interview with Nick Chamandy, statistician at Google

We talked to him about how he ended up at Google, what software he uses, and how big the Google data sets are. To read more interviews - check out our interviews page. SS: Which term applies to you: data scientist, statistician, computer scientist, or something else? NC: I usually use the term Statistician, but at Google we are also known as Data Scientists or Quantitative Analysts. What the Fox Knows. FiveThirtyEight is a data journalism organization.

What the Fox Knows

Let me explain what we mean by that, and why we think the intersection of data and journalism is so important. If you’re a casual reader of FiveThirtyEight, you may associate us with election forecasting, and in particular with the 2012 presidential election, when our election model “called” 50 out of 50 states right. Certainly we had a good night. But this was and remains a tremendously overrated accomplishment.

Other forecasters, using broadly similar methods, performed just as well or nearly as well, correctly predicting the outcome in 48 or 49 or 50 states. Stories on Data Science and Analytics. Hadley Wickham has recently joined Rstudio as Chief Scientist.

Stories on Data Science and Analytics

Previously, he spent over four years as a statistics professor at Rice University. Hadley considers himself primarily a tool-builder for data scientists working in R. He is interested in tools that reduce the cognitive burden of solving data science problems. He likes to figure out good ways to think about problems, then match up cognitive tools with computational tools that make it easy to solve real problems. Hadley Wickham: Impact the world by being useful.

Hadley Wickham writes: The best way to impact the world as a data scientist or statistician is to be useful.

Hadley Wickham: Impact the world by being useful

This column gives my advice on being useful: • Write code • Work in the open • Teach • Tell the world (There are lots of other ways to be useful, but this is my path.) Interview with Hadley Wickham. I recently interviewed Hadley Wickham the creator of Ggplot2 and a famous R Stats person.

Interview with Hadley Wickham

He works for RStudio and his job is to work on Open Source software aimed at Data Geeks. Hadley is famous for his contributions to Data Science tooling and inspires a lot of other languages! I include some light edits. 1. What project have you worked on do you wish you could go back to, and do better? One thing that I’m particularly excited about is adding an official extension mechanism, so that others can extend ggplot2 by creating their own geoms, stats etc. Google Statistician uses R and other programming tools. A great interview on the Simply Statistics blog with Google's Nick Chamandy, Phd in Statistics.

Google Statistician uses R and other programming tools

Explains that he mainly uses R among other tools to perform his work at Google. Also of note is the active data science community within Google that uses R as well as some other interesting tools. Note that they use a lot of data at Google, understandably, and that R usually can not handle the size. They do a lot of ad hoc reduction of the data with tools like map reduce, Go, and even an R API. How data-driven companies use R to compete. Using R packages and education to scale Data Science at Airbnb — Airbnb Engineering & Data Science. One of my favorite things about being a data scientist at Airbnb is collaborating with a diverse team to solve important real-world problems.

Using R packages and education to scale Data Science at Airbnb — Airbnb Engineering & Data Science

We are diverse not only in terms of gender, but also in educational backgrounds and work experiences. Our team includes graduates from Mathematics and Statistics programs, PhDs in fields from Education to Computational Genomics, veterans of the tech and finance worlds, as well as former professional poker players and military veterans. Statistical Consulting in R, Matlab, SAS, SPSS, Stata. Almost all serious statistical analysis is done in one of the following packages: R (S-PLUS), Matlab, SAS, SPSS and Stata.

Statistical Consulting in R, Matlab, SAS, SPSS, Stata

I have expertise in each of those packages but it does not mean that each of those packages is good for a specific type of analysis. In fact, for most advanced areas only 2-3 packages will be suitable, providing enough functionality or enough tools to implement this functionality easily. For example, a very important area of Markov Chain Monte Carlo is doable in R, Matlab and SAS only, unless you want to rely on convoluted macros written by random users on the web. The table at the end of this page compares the five packages in great detail. R and Matlab are the richest systems by far. Do doctors understand test results? 6 July 2014Last updated at 19:04 ET By William Kremer BBC World Service Are doctors confused by statistics? A new book by one prominent statistician says they are - and that this makes it hard for patients to make informed decisions about treatment. In 1992, shortly after Gerd Gigerenzer moved to Chicago, he took his six-year-old daughter to the dentist.

She didn't have toothache, but he thought it was about time she got acquainted with the routine of sitting in the big reclining chair and being prodded with pointy objects. What is your 'effective age'? How old are you? You might, quite reasonably, calculate your chronological age as the time elapsed since you were born. But what is the effective age of your body? If you want to find out, you can go to many websites which will tell you your ‘real age’, such as this one by Doctor Oz, or your 'health age', ‘vitality age’, or ‘biological age’. For all these calculators, you put in various characteristics of your health and habits, and out pops an assessment of say your biological age or health age. Why the Quantified Will Inherit the Earth. I"m the co-founder of Koalify, a personal analytics startup. Check out www.koalify.com to start improving your life with personal analytics today. Someone I respect told me “it’s easier to sell painkillers than vitamins.”

He’s right, but it’s also an oversimplification. Today’s skipped vitamins are tomorrow’s painkillers – just ask retail execs who didn’t take big data seriously during the rise of Walmart. There isn’t a major company in the world today that doesn’t put serious investment behind understanding their business/customers through data. Best and worst graduate degrees for jobs. It’s that time of year when college graduates ponder their future plans, and those heading for more higher learning put down deposits for grad school tuition. In a knowledge economy, the pay gap is the widest it’s been in a generation, between those with more education, versus those with less. Which degrees are the best investment? Salary may not be the sole motivation for pursuing a graduate degree, of course. But it makes sense to know the outlook for someone on your educational pathway before ponying up – or, taking on a huge long-term debt (in the U.S. today, average tuition for a graduate degree runs $36,000 to $63,000 a year.)

To determine the best and worst graduate degrees for jobs, Fortune consulted the careers site, PayScale. The ranking is based upon these factors: Long-term outlook for job growth. My data is bigger than yours. Analyzing the fragility of BigData with Nassim N. Taleb. Gil Press (BigData guru writing for Forbes) as well as others [1][2] [3] have recently suggested that organizations can become Antifragile (gain from disorder) by adopting a BigData strategy.

The concept of Antifragility has been described at length by Nassim Nicholas Taleb [1] in his brilliant book. For those who are new to the subject of Antifragility here is a video summary where Taleb gives a general overview: Now in the context of BigData, it seems both Gil Press as well as many other marketing departments operating in the name of BigData product vendors, have completely misunderstood the concept of Antifragility. How odd is a cluster of plane accidents? Statistics is the least important part of data science. A world without statistics. A heuristic for sorting science stories in the news. How computer analysts took over at Britain's top football clubs. What it means to be a statistician, according to Lord Claus Moser. UK businesses warn a data skills shortage could block potential of big data. Four out of five of the 45 data-intensive businesses interviewed by Nesta are struggling to find the skills they need to meet growing demand, according to a new report authored by Nesta in association with Creative Skillset and the Royal Statistical Society.

How to Speak Data Science. Data Science has its own language. Reproducible randomized controlled trials. Ten commandments for good data management. Usually when I am asked to give a few words to describe myself I say macroecologist or large-scale-ecologist. Those who can, teach statistics. Teaching statistical report-writing. It is difficult to write statistical reports and it is difficult to teach how to write statistical reports. When statistics is taught in the traditional way, with emphasis on the underlying mathematics the process of statistics is truncated at both ends. When we concentrate on the sterile analysis, the messy “writing stuff” is avoided. Is statistical enquiry a cycle? How to study statistics (Part 1) Difficult concepts in statistics. Parts and whole. Context – if it isn’t fun… Teaching Confidence Intervals.

Probability and Deity. What does it mean to understand statistics? Advise for teaching an R workshop. Data for teaching – real, fake, fictional. Why people hate statistics.