A guide to tools for collaboration with R This a brief guide to using R in collaborative, social ways. R is a powerful open-source programming language for data analysis, statistics, and visualization, but much of its power derives from a large, engaged community of users. This is an introduction to tools for engaging the community to improve your R code and collaborate with others. Getting started with the `boot' package in R for bootstrap inference The package boot has elegant and powerful support for bootstrapping. In order to use it, you have to repackage your estimation function as follows. R has very elegant and abstract notation in array indexes. Suppose there is an integer vector OBS containing the elements 2, 3, 7, i.e. that OBS <- c(2,3,7);. Suppose x is a vector.
Introduction to Statistical Learning An Introduction to Statistical Learning with Applications in R Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani This book provides an introduction to statistical learning methods. Lexical scope and function closures in R Introduction R is different to many “easy to use” statistical software packages – it expects to be given commands at the R command prompt. This can be intimidating for new users, but is at the heart of its power. Most powerful software tools have an underlying scripting language.
Bootstrapping Nonparametric Bootstrapping The boot package provides extensive facilities for bootstrapping and related resampling methods. You can bootstrap a single statistic (e.g. a median), or a vector (e.g., regression weights). This section will get you started with basic nonparametric bootstrapping. The main bootstrapping function is boot( ) and has the following format: bootobject <- boot(data= , statistic= , R=, ...) where R tips pages This page introduces the basics of working with data sets having multiple variables, often of several types. The focus here is on data frames, which are the most convenient data objects in R. Matrices and lists are also useful data objects, and these are introduced briefly at the end. Missions: Using Git: Setup Git Git is a version control system (VCS) created by Linus Torvalds, the creator of the Linux kernel. Git is known as a 'distributed” VCS, or DVCS. This means that each user's copy of the code is a fully working repository and includes all previous commit information.
Statistics with R Warning Here are the notes I took while discovering and using the statistical environment R. However, I do not claim any competence in the domains I tackle: I hope you will find those notes useful, but keep you eyes open -- errors and bad advice are still lurking in those pages... Should you want it, I have prepared a quick-and-dirty PDF version of this document.
Bricks not monoliths Chapter 32 of Tao Te Programming advises you to make bricks instead of monoliths. Here is an example. The example is written with the syntax of R and is a data analysis, but the principle is valid no matter what language you use or what your task is. Random forests - classification description Contents Introduction Overview Features of random forests Remarks How Random Forests work The oob error estimate Variable importance Gini importance Interactions Proximities Scaling Prototypes Missing values for the training set Missing values for the test set Mislabeled cases Outliers Unsupervised learning Balancing prediction error Detecting novelties A case study - microarray data Classification mode Variable importance Using important variables Variable interactions Scaling the data Prototypes Outliers A case study - dna data Missing values in the training set Missing values in the test set Mislabeled cases Case Studies for unsupervised learning Clustering microarray data Clustering dna data Clustering glass data Clustering spectral data References Introduction This section gives a brief overview of random forests and some comments about the features of the method. Overview We assume that the user knows about the construction of single classification trees.