background preloader

R Programming

Facebook Twitter

Pbd: programming with big data in R. Climate Charts & Graphs | My R and Climate Change Learning Curve. Lexical scope and function closures in R | Darren Wilkinson's research blog. Introduction R is different to many “easy to use” statistical software packages – it expects to be given commands at the R command prompt. This can be intimidating for new users, but is at the heart of its power. Most powerful software tools have an underlying scripting language. This is because scriptable tools are typically more flexible, and easier to automate, script, program, etc. In fact, even software packages like Excel or Minitab have a macro programming language behind the scenes available for “power users” to exploit. Programming from the ground up It is natural to want to automate (repetitive) tasks on a computer, to automate a “work flow”.

Next, one can add in simple control structures, to support looping, branching and conditional execution. Although scripting is a simple form of programming, it isn’t “real” programming, or software engineering. Functions and procedures Variable scope Dynamic scope Lexical scope Function closures Function closures for scientific computing. R - Books. Missing Data. In R, missing values are represented by the symbol NA (not available) . Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number).

Unlike SAS, R uses the same symbol for character and numeric data. Testing for Missing Values is.na(x) # returns TRUE of x is missing y <- c(1,2,3,NA) is.na(y) # returns a vector (F F F T) Recoding Values to Missing # recode 99 to missing for variable v1 # select rows where v1 is 99 and recode column v1 mydata$v1[mydata$v1==99] <- NA Excluding Missing Values from Analyses Arithmetic functions on missing values yield missing values. x <- c(1,2,NA,3) mean(x) # returns NA mean(x, na.rm=TRUE) # returns 2 The function complete.cases() returns a logical vector indicating which cases are complete. # list rows of data that have missing values mydata[! The function na.omit() returns the object with listwise deletion of missing values. # create new dataset without missing data newdata <- na.omit(mydata) Advanced Handling of Missing Data.

R Tutorials--Data Frames. Preamble There is plenty to say about data frames because they are the primary data structure in R. Some of what follows is essential knowledge. Some of it will be satisfactorily learned for now if you remember that "R can do that. " I will try to point out which parts are which. Set aside some time. This is a long one! Definition and Examples (essential) A data frame is a table, or two-dimensional array-like structure, in which each column contains measurements on one variable, and each row contains one case.

Let's say we've collected data on one response variable or DV from 15 subjects, who were divided into three experimental groups called control ("contr"), treatment one ("treat1"), and treatment two ("treat2"). Contr treat1 treat2 --------------------------- 22 32 30 18 35 28 25 30 25 25 42 22 20 31 33 --------------------------- This is a proper data frame (and leave out the dashed lines, although in actual fact R could read this table just as you see it here). Here's the catch. Omegahat Statistical Computing. R Time Series Tutorial. The data sets used in this tutorial are available in astsa, the R package for the text. A detailed tutorial (and more!) Is available in Appendix R of the text. This page is basically the quick fix from Edition 2 updated a bit. You can copy-and-paste the R commands (multiple lines are ok) from this page into R. This quick fix is meant for people who are just starting to use R for time series analysis.

If you're new to R/Splus, I suggest reading R for Beginners (a pdf file) first. . ◊ Baby steps... your first R session. Ok, now you're an expert useR. We're going to get astsa now: install.packages("astsa") # install it ... you'll be asked to choose the closest CRAN mirror require(astsa) # then load it (has to be done at the start of each session) Let's play with the Johnson & Johnson data set. Data(jj) # load the data jj # print it to the screen Qtr1 Qtr2 Qtr3 Qtr4 1960 0.71 0.63 0.85 0.44 1961 0.61 0.69 0.92 0.55 . . . . . . . . . . 1979 14.04 12.96 14.85 9.99 1980 16.20 14.67 16.02 11.61. Probability Distributions. Say it in R with "by", "apply" and friends. R is a language, as Luis Apiolaza pointed out in his recent post. This is absolutely true, and learning a programming language is not much different from learning a foreign language.

It takes time and a lot of practice to be proficient in it. I started using R when I moved to the UK and I wonder, if I have a better understanding of English or R by now. Languages are full of surprises, in particular for non-native speakers. The other day I learned that there is courtesy and curtsey. With languages you can get into habits of using certain words and phrases, but sometimes you see or hear something, which shakes you up again.

F <- function(x) x^2 sapply(1:10, f) [1] 1 4 9 16 25 36 49 64 81 100 It reminded me of the phrase that everything is a list in R. I remember how happy I felt, when I finally understood the by function in R. By aggregate The aggregate function splits the data into subsets and computes summary statistics for each of them. Aggregate( . ~ Species, iris, mean) apply and tapply. Resources to help you learn and use R. R. Look what I found: two amazing charts. While doing some research for my statistics blog, I came across a beauty by Lane Kenworthy from almost a year ago (link) via this post by John Schmitt (link). How embarrassing is the cost effectiveness of U.S. health care spending? When a chart is executed well, no further words are necessary.

I'd only add that the other countries depicted are "wealthy nations". Even more impressive is this next chart, which plots the evolution of cost effectiveness over time. An important point to note is that the U.S. started out in 1970 similar to the other nations. Let's appreciate this beauty: Let the data speak for itself. Plyr. Knitr: Elegant, flexible and fast dynamic report generation with R | knitr. Overview The knitr package was designed to be a transparent engine for dynamic report generation with R, solve some long-standing problems in Sweave, and combine features in other add-on packages into one package (knitr ≈ Sweave + cacheSweave + pgfSweave + weaver + animation::saveLatex + R2HTML::RweaveHTML + highlight::HighlightWeaveLatex + 0.2 * brew + 0.1 * SweaveListingUtils + more).

This package is developed on GitHub; for installation instructions and FAQ’s, see README. This website serves as the full documentation of knitr, and you can find the main manual, the graphics manual and other demos / examples here. For a more organized reference, see the knitr book. Motivation One of the difficulties with extending Sweave is we have to copy a large amount of code from the utils package (the file SweaveDrivers.R has more than 700 lines of R code), and this is what the two packages mentioned above have done. Features Acknowledgements Misc. Short-refcard.pdf (application/pdf Object) Useful Links. ComputingPresentation.R.conditionals.pdf (application/pdf Object)

Community sites

R tools. R tutoriels et des lésions.