background preloader



R Reference Card Learning R Summer 2010 — R: ggplot2 Intro Contents Intro When it comes to producing graphics in R, there are basically three options for your average user. base graphics I've written up a pretty comprehensive description for use of base graphics here, and don't intend to extend beyond that. Both and make creating plots of multivariate data easier. The website for ggplot2 is here: Basics is meant to be an implementation of the Grammar of Graphics, hence gg-plot. Plots convey information through various aspects of their aesthetics. x position y position size of elements shape of elements color of elements The elements in a plot are geometric shapes, like points lines line segments bars text Some of these geometries have their own particular aesthetics. points point shape point size lines line type line weight bars y minimum y maximum fill color outline color text label value The values represented in the plot are the product of various statistics. Layer by Layer Displaying Statistics

developers:projects:gsoc2012:ropensci Summary: Dynamic access and visualization of scientific data repositories Description: rOpenSci is a collaborative effort to develop R-based tools for facilitating Open Science. Projects in rOpenSci fall into two categories: those for working with the scientific literature, and those for working directly with the databases. See a complete list of our R packages currently in development. The student could choose to work on a package for a particular data repository of interest, or develop tools for visualization and exploration that could function across the existing packages. Skills required: Should be able to use R to perform data manipulation and aggregation. Mentor: The rOpenSci dev team, Carl Boettiger, Scott Chamberlain, and Karthik Ram, with support from rOpenSci advisors Hadley Wickam, Duncan Temple Lang, Bertram Ludascher, JJ Allaire and Matt Jones.

Cookbook for R » Cookbook for R Model visualisation. This page lists my published software for model visualisation. This work forms the basis for the third chapter of my thesis. classifly: Explore classification boundaries in high dimensions. Given p-dimensional training data containing d groups (the design space), a classification algorithm (classifier) predicts which group new data belongs to. Generally the input to these algorithms is high dimensional, and the boundaries between groups will be high dimensional and perhaps curvilinear or multi-facted. clusterfly: Explore clustering results in high dimensions. Typically, there is somewhat of a divide between statistics and visualisation software. There are also some custom methods for certain types of clustering, mostly inspired by the work of Dr Dianne Cook: Self organising maps (aka Kohonen neural networks), ? meifly: Models explored interactively. Installation Please make sure you have a current version of R and rggobi installed, then use the following R code: Presentations/publications

Cookbook for R » Cookbook for R Quick-R: Home Page R Programming Welcome to the R programming Wikibook This book is designed to be a practical guide to the R programming language[1]. R is free software designed for statistical computing. There is already great documentation for the standard R packages on the Comprehensive R Archive Network (CRAN)[2] and many resources in specialized books, forums such as Stackoverflow[3] and personal blogs[4], but all of these resources are scattered and therefore difficult to find and to compare. How can you share your R experience ? Explain the syntax of a commandCompare the different ways of performing each task using R.Try to make unique examples based on fake data (ie simulated data sets).As with any Wikibook please feel free to make corrections, expand explanations, and make additions where necessary. Some rules : Prerequisites[edit] We assume that readers have a background in statistics. We also assume that readers are familiar with computers and that they know how to use software with a command-line interface.

Highland Statistics Ltd Jump straight to Price and Order the book Outline Keywords Table of Contents Data sets and R code used Video files Support chapters Discussion board Outline This book presents Generalized Linear Models (GLM) and Generalized Linear Mixed Models (GLMM) based on both frequency-based and Bayesian concepts. The book uses the functions glm, lmer, glmer, glmmADMB, and also JAGS from within R. R code to construct, fit, interpret, and comparatively evaluate models is provided at every stage. Readers of this book have free access to: Chapter 1 of Zero Inflated Models and Generalized Linear Mixed Models with R. (2012a) Zuur, Saveliev, Ieno. See the Preface (and the text below) how to access the pdfs of these chapters. Keywords Table of contents Click for Table of contents Price and Order the book The paperback is priced at 49 GBP. Copyright statement This book is copyright material from Highland Statistics Ltd. Data sets and R code used in the book. Video file with general comments Alain Zuur Support chapters

RStudio Server Amazon Machine Image (AMI) - Louis Aslett Current AMI Quick Reference (27nd Jun 2015)Amazon instance type reference Click to launch through AWS web interface: What’s new recently? Easy Dropbox setup to make syncing files on/off server easy, including selective folder sync. Preinstalled RStudioAMI R package for server control. HVM AMIs for full current generation instance support. Defaults to high speed SSD drives (faster, zero IO costs, only $1pm in most regions). < Back to homepage Amazon’s EC2 platform provides a convenient environment for rapidly procuring computational resources in the cloud. To get started with the Amazon cloud, you must first signup for an AWS account if you don’t already have one. Click here for a simple video guide to using the AMIs listed here, or for more detailed information read on. What is this? If you want to run a server in the Amazon cloud, you have to select what system you are going to bootup. In particular, many common tools and dependencies are built-in. Why an RStudio AMI? AMI Release History

Building an R Hadoop System - R and Data Mining This page shows how to build an R Hadoop system, and presents the steps to set up my first R Hadoop system in single-node mode on Mac OS X. After reading documents and tutorials on MapReduce and Hadoop and playing with RHadoop for about 2 weeks, finally I have built my first R Hadoop system and successfully run some R examples on it. Here I’d like to share my experience and steps to achieve that. Hopefully it will make it easier to try RHadoop for R users who are new to Hadoop. Before going through the complex steps below, let’s have a look what you can get, to give you a motivation to continue. Now let’s start. 1. 1.1 Download Hadoop Download Hadoop (hadoop-1.1.2-bin.tar.gz) at and then unpack it. 1.2 Set JAVA_HOME In conf/, add the line below: export JAVA_HOME=/Library/Java/Home 1.3 Set up Remote Desktop and Enabling Self-Login ssh-keygen -t rsa -P "" cat $HOME/.ssh/ >> $HOME/.ssh/authorized_keys 2. 2.1 Start Hadoop 3. 4.

10 tips for making your R graphics look their best So you've spent hours slaving over the code for a beautiful statistical graphic in R, and now you're ready to show it to the world. You might be printing it, embedding it in a document, or displaying it on the web. Don't do your graph a disservice by causing it to look anything less than perfect in its final venue. Here are 10 tips to help make sure your graphic will always look best. 1. Call the right device driver from a script It's tempting to just create graphics to the on-screen device (such as X11 on Linux or Quartz on MacOS) and then just use "Save As..." from the menu. The best practice is to create a script file that begins with a call to the device driver (usually pdf or png), runs the graphics commands, and then finishes with a call to png(file="mygraphic.png",width=400,height=350)plot(x=rnorm(10),y=rnorm(10),main="example") 2. If you plan to print your graphic, you want to use a vector-based format. 3. 4. 5. For PNG graphs, it's a bit tricker. 6.