statistics by Terry M. Therneau Ph.D.Faculty, Mayo Clinic About a year ago there was a query about how to do "type 3" tests for a Cox model on the R help list, which someone wanted because SAS does it. The SAS addition looked suspicious to me, but as the author of the survival package I thought I should understand the issue more deeply. In-depth introduction to machine learning in 15 hours of expert videos In January 2014, Stanford University professors Trevor Hastie and Rob Tibshirani (authors of the legendary Elements of Statistical Learning textbook) taught an online course based on their newest textbook, An Introduction to Statistical Learning with Applications in R (ISLR). I found it to be an excellent course in statistical learning (also known as “machine learning”), largely due to the high quality of both the textbook and the video lectures. And as an R user, it was extremely helpful that they included R code to demonstrate most of the techniques described in the book. If you are new to machine learning (and even if you are not an R user), I highly recommend reading ISLR from cover-to-cover to gain both a theoretical and practical understanding of many important methods for regression and classification. It is available as a free PDF download from the authors’ website. Chapter 1: Introduction (slides, playlist)
Data Gravity The purpose of this site is to explore Data Gravity and Data Physics. By explore, we mean embrace with the community in open discussion with a goal of everyone in Software, Networking, Data, and Compute benefiting in the long term. Data Gravity was a concept first described in this blog post. Getting started with the `boot' package in R for bootstrap inference The package boot has elegant and powerful support for bootstrapping. In order to use it, you have to repackage your estimation function as follows. R has very elegant and abstract notation in array indexes.
Our top 10 Data Science articles in 2014 2014 has been a year of growth for us. We now get 10x traffic compared to what we used to get 12 months back. It gives us immense satisfaction to be able to create something which is helping more and more people every day. We only hope that we could get some more time to create more content for our audience! Prediction model for the FIFA World Cup 2014 Like a last minute goal, so to speak, Andreas Groll and Gunther Schauberger of Ludwig-Maximilians-University Munich announced their predictions for the FIFA World Cup 2014 in Brazil – just hours before the opening game. Andreas Groll, with his successful prediction of the European Championship 2012 already experienced in this field, and Gunther Schauberger did set out to predict the 2014 world cup champion based on statistical modeling techniques and R. A bit surprisingly, Germany is estimated with highest probability of winning the trophy (28.80%), exceeding Brazil’s probability (the favorite according to most bookmakers) only marginally (27.65%).
Bootstrapping Nonparametric Bootstrapping The boot package provides extensive facilities for bootstrapping and related resampling methods. You can bootstrap a single statistic (e.g. a median), or a vector (e.g., regression weights). This section will get you started with basic nonparametric bootstrapping. Must read books for Analysts (or people interested in Analytics) One of the ways I continue my learning is reading. I read for 30 minutes before hitting the bed every day. This not only makes sure that I learn some thing daily, but also ends my day in a fulfilling manner. Learn R for beginners with our PDF With so much emphasis on getting insight from data these days, it's no wonder that R is rapidly rising in popularity. R was designed from day one to handle statistics and data visualization, it's highly extensible with many new packages aimed at solving real-world problems and it's open source (read "free"). If you're ready to learn, we have just the ticket: A free PDF of Computerworld's "Beginner's guide to R." Included in this 45-page guide:
40 Free Online Tools and Software to Improve Your Workflow Jun 08 2011 Charts and graphs are the most effective ways to show the relationship between two different and interlinked entities. On a web page, a comprehensively designed flowchart, diagram or graph can be worth a thousand words. Random forests - classification description Contents Introduction Overview Features of random forests Remarks How Random Forests work The oob error estimate Variable importance Gini importance Interactions Proximities Scaling Prototypes Missing values for the training set Missing values for the test set Mislabeled cases Outliers Unsupervised learning Balancing prediction error Detecting novelties A case study - microarray data Classification mode Variable importance Using important variables Variable interactions Scaling the data Prototypes Outliers A case study - dna data Missing values in the training set Missing values in the test set Mislabeled cases Case Studies for unsupervised learning Clustering microarray data Clustering dna data Clustering glass data Clustering spectral data References Introduction This section gives a brief overview of random forests and some comments about the features of the method.
Shiny - Tutorial The How to Start Shiny video series will take you from R programmer to Shiny developer. Watch the complete tutorial here, or jump to a specific chapter by clicking a link below. The entire tutorial is two hours and 25 minutes long. Part 1 - How to build a Shiny app Part 3 - How to customize appearance You will get the most out of these tutorials if you already know how to program in R, but not Shiny.