background preloader

R and Data Mining

R and Data Mining

Summer 2010 — R: ggplot2 Intro Contents Intro When it comes to producing graphics in R, there are basically three options for your average user. base graphics I've written up a pretty comprehensive description for use of base graphics here, and don't intend to extend beyond that. Both and make creating plots of multivariate data easier. The website for ggplot2 is here: Basics is meant to be an implementation of the Grammar of Graphics, hence gg-plot. Plots convey information through various aspects of their aesthetics. x position y position size of elements shape of elements color of elements The elements in a plot are geometric shapes, like points lines line segments bars text Some of these geometries have their own particular aesthetics. points point shape point size lines line type line weight bars y minimum y maximum fill color outline color text label value The values represented in the plot are the product of various statistics. Layer by Layer Displaying Statistics

Physics The R Project A Tutorial on Using Functions in R! (and their scoping) Introduction In a previous post, we covered part of the R language control flow, the cycles or loop structures. In a subsequent one, we showed how to avoid 'looping' by means of functions, that act on compound data in repetitive ways (the apply family of functions). Here, we introduce the notion of function from the R programmer point of view and illustrate the range of action that functions have within the R code ('scope'). The post will highlight concepts such as: what R functions are and when to use them, user defined functions in R, scoping in R, developing your own functions in R, return and nested function calls in R, and R as a functional programming language. If you want to learn more on functions in R? What is a function? In programming, we use functions to incorporate sets of instructions that we want to use repeatedly or that, because of their complexity, are better self-contained in a sub program and called when needed. Functions in R function ( arglist ) {body} Points to note

Learning R R reference card developers:projects:gsoc2012:ropensci Summary: Dynamic access and visualization of scientific data repositories Description: rOpenSci is a collaborative effort to develop R-based tools for facilitating Open Science. Projects in rOpenSci fall into two categories: those for working with the scientific literature, and those for working directly with the databases. Visit the active development hub of each project on github, where you can see and download source-code, see updates, and follow or join the developer discussions of issues. Most of the packages work through an API provided by the resource (database, paper archive) to access data and bring it within reach of R’s powerful manipulation. See a complete list of our R packages currently in development. The student could choose to work on a package for a particular data repository of interest, or develop tools for visualization and exploration that could function across the existing packages.

Webinar | Introduction to R for Data Mining For a quick start: Find a way of orienting yourself in the open source R worldHave a definite application area in mindSet an initial goal of doing something useful and then build on it In this webinar, we focus on data mining as the application area and show how anyone with just a basic knowledge of elementary data mining techniques can become immediately productive in R. We will: Provide an orientation to R’s data mining resourcesShow how to use the "point and click" open source data mining GUI, rattle, to perform the basic data mining functions of exploring and visualizing data, building classification models on training data sets, and using these models to classify new data.Show the simple R commands to accomplish these same tasks without the GUIDemonstrate how to build on these fundamental skills to gain further competence in RMove away from using small test data sets and show with the same level of skill one could analyze some fairly large data sets with RevoScaleR

Cookbook for R » Cookbook for R Quick-R: Home Page Model visualisation. This page lists my published software for model visualisation. This work forms the basis for the third chapter of my thesis. classifly: Explore classification boundaries in high dimensions. Given p-dimensional training data containing d groups (the design space), a classification algorithm (classifier) predicts which group new data belongs to. clusterfly: Explore clustering results in high dimensions. Typically, there is somewhat of a divide between statistics and visualisation software. There are also some custom methods for certain types of clustering, mostly inspired by the work of Dr Dianne Cook: Self organising maps (aka Kohonen neural networks), ? meifly: Models explored interactively. Meifly is tool that uses R and GGobi to explore ensembles of linear models, where we look at all possible main effects models for a given dataset (or a large subset of these models). Installation Please make sure you have a current version of R and rggobi installed, then use the following R code:

Cookbook for R » Cookbook for R