background preloader

Resources to help you learn and use R

Resources to help you learn and use R

Say it in R with "by", "apply" and friends R is a language, as Luis Apiolaza pointed out in his recent post. This is absolutely true, and learning a programming language is not much different from learning a foreign language. It takes time and a lot of practice to be proficient in it. Languages are full of surprises, in particular for non-native speakers. With languages you can get into habits of using certain words and phrases, but sometimes you see or hear something, which shakes you up again. f <- function(x) x^2 sapply(1:10, f) [1] 1 4 9 16 25 36 49 64 81 100 It reminded me of the phrase that everything is a list in R. I remember how happy I felt, when I finally understood the by function in R. by"rbind", as.list( by(iris, list(Species=iris$Species), function(x){ y <- subset(x, select= -Species) apply(y, 2, mean) } ))) Sepal.Length Sepal.Width Petal.Length Petal.Width setosa 5.006 3.428 1.462 0.246 versicolor 5.936 2.770 4.260 1.326 virginica 6.588 2.974 5.552 2.026 aggregate aggregate( . ~ Species, iris, mean) ddply

R Time Series Tutorial The data sets used in this tutorial are available in astsa, the R package for the text. A detailed tutorial (and more!) is available in Appendix R of the text. This page is basically the quick fix from Edition 2 updated a bit. You can copy-and-paste the R commands (multiple lines are ok) from this page into R. Printed output is blue, and you wouldn't want to paste those lines into R, would you? This quick fix is meant for people who are just starting to use R for time series analysis. If you're new to R/Splus, I suggest reading R for Beginners (a pdf file) first. ◊ Baby steps... your first R session. Ok, now you're an expert useR. We're going to get astsa now: install.packages("astsa") # install it ... you'll be asked to choose the closest CRAN mirror require(astsa) # then load it (has to be done at the start of each session) Let's play with the Johnson & Johnson data set. and you see that jj is a collection of 84 numbers called a time series object. Now try a plot of the data: you get:

Omegahat Statistical Computing R Tutorials--Data Frames Preamble There is plenty to say about data frames because they are the primary data structure in R. Some of what follows is essential knowledge. Some of it will be satisfactorily learned for now if you remember that "R can do that." I will try to point out which parts are which. Definition and Examples (essential) A data frame is a table, or two-dimensional array-like structure, in which each column contains measurements on one variable, and each row contains one case. Let's say we've collected data on one response variable or DV from 15 subjects, who were divided into three experimental groups called control ("contr"), treatment one ("treat1"), and treatment two ("treat2"). contr treat1 treat2 --------------------------- 22 32 30 18 35 28 25 30 25 25 42 22 20 31 33 --------------------------- scores group ---------------- 22 contr 18 contr 25 contr 25 contr 20 contr 32 treat1 35 treat1 30 treat1 42 treat1 31 treat1 30 treat2 28 treat2 25 treat2 22 treat2 33 treat2 ----------------

Missing Data In R, missing values are represented by the symbol NA (not available) . Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number). Unlike SAS, R uses the same symbol for character and numeric data. Testing for Missing Values # returns TRUE of x is missing y <- c(1,2,3,NA) # returns a vector (F F F T) Recoding Values to Missing # recode 99 to missing for variable v1 # select rows where v1 is 99 and recode column v1 mydata$v1[mydata$v1==99] <- NA Excluding Missing Values from Analyses Arithmetic functions on missing values yield missing values. x <- c(1,2,NA,3) mean(x) # returns NA mean(x, na.rm=TRUE) # returns 2 The function complete.cases() returns a logical vector indicating which cases are complete. # list rows of data that have missing values mydata[! The function na.omit() returns the object with listwise deletion of missing values. # create new dataset without missing data newdata <- na.omit(mydata) Advanced Handling of Missing Data

Lexical scope and function closures in R | Darren Wilkinson's research blog Introduction R is different to many “easy to use” statistical software packages – it expects to be given commands at the R command prompt. This can be intimidating for new users, but is at the heart of its power. Most powerful software tools have an underlying scripting language. Programming from the ground up It is natural to want to automate (repetitive) tasks on a computer, to automate a “work flow”. Next, one can add in simple control structures, to support looping, branching and conditional execution. Although scripting is a simple form of programming, it isn’t “real” programming, or software engineering. Functions and procedures Procedures (or subroutines) are re-usable pieces of code which can be called from other pieces of code when needed. Functions are also re-usable pieces of code, but are mainly used to obtain a return-value that is computed on the basis of the given inputs. Variable scope Dynamic scope Lexical scope Function closures Function closures for scientific computing

Climate Charts & Graphs | My R and Climate Change Learning Curve Look what I found: two amazing charts While doing some research for my statistics blog, I came across a beauty by Lane Kenworthy from almost a year ago (link) via this post by John Schmitt (link). How embarrassing is the cost effectiveness of U.S. health care spending? When a chart is executed well, no further words are necessary. I'd only add that the other countries depicted are "wealthy nations". Even more impressive is this next chart, which plots the evolution of cost effectiveness over time. Let's appreciate this beauty: Let the data speak for itself. plyr