background preloader


Related:  R Resources

Euclid - Breakthrough analytics for offline retail Personal R Packages | Not So Standard Deviations I came across this R package on GitHub, and it made me so excited that I decided to write a post about it. It’s a compilation by Karl Broman of various R functions that he’s found helpful to write throughout the years. Wouldn’t it be great if incoming graduate students in Biostatistics/Statistics were taught to create a personal repository of functions like this? Not only is it a great way to learn how to write an R package, but it also encourages good coding techniques for newer students (since it encourages them to write separate functions with documentation). It also allows for easy reprodicibility and collaboration both within the school and with the broader community. (Note that install_github is a function in the devtools package. For whatever reason, when I think of R packages, I think of big, unified projects with a specified scientific aim. Like this: Like Loading...

Probability and Statistics Cookbook | Matthias Vallentin The cookbook contains a succinct representation of various topics in probability theory and statistics. It provides a comprehensive reference reduced to the mathematical essence, rather than aiming for elaborate explanations. Download Last updated: January 24, 2014 Language: english The LaTeX source code is available on github and comes with a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To reproduce in a different context, please contact me. The cookbook aims to be language agnostic and factors out its textual elements into a separate dictionary. The current translation setup is heavily geared to Roman languages, as this was the easiest way to begin with. Here are the 3 most recent entries of the changelog file (for all versions of the cookbook): 2014-01-24 Matthias Vallentin <> * Fix wrong denominator in alternative CLT representations.

Principal Component Analysis A simple example Consider 100 students with Physics and Statistics grades shown in the diagram below. The data set is in marks.dat. If we want to compare among the students which grade should be a better discriminating factor? Physics or Statistics? Surely Physics, since the variation is larger there. Here the direction of maximum variation is like a slanted straight line. dat = read.table("marks.dat",head=T) dim(dat) names(dat) pc = princomp(~Stat+Phys,dat) pc$loading Notice the somewhat non-intuitive syntax of the princomp function. R has returned two principal components. pc to learn the amount of spread of the data along the chosen directions. names(pc) We shall not go into all these here. pc$scores Higher dimensions Most statisticians consider PCA a tool for reducing dimension of data. The same conclusion may be obtained by PCA. Putting it to action quas = read.table("SDSS_quasar.dat",head=T) dim(quas) names(quas) quas = na.omit(quas) dim(quas) Now we shall apply PCA. to see them. plot(pc)

Project Daytona: Iterative MapReduce on Windows Azure Microsoft has developed an iterative MapReduce runtime for Windows Azure, code-named Daytona. Project Daytona is designed to support a wide class of data analytics and machine-learning algorithms. It can scale to hundreds of server cores for analysis of distributed data. Project Daytona was developed as part of the eXtreme Computing Group’s Cloud Research Engagement Initiative. News On July 26, 2011, we released an updated Daytona community technical preview (CTP) that contains fixes that are related to scalability. Overview Project Daytona on Window Azure is now available, along with a deployment guide, developer and user documentation, and code samples for both data analysis algorithms and client application. Included in the CTP Refresh (July 26, 2011) This refresh to the Daytona CTP contains the following enhancments: Included in the CTP Release (July 18, 2011) About Project Daytona

D G Rossiter - Publications & Computer Programs Rossiter, DG 2012. Introduction to the R Project for Statistical Computing for use at ITC 14-Aug-2012, v + 136 pp. (First version 2003) On-line, version 4.0 (3 Mb) Rossiter, DG 2014. Tutorial: Using the R Environment for Statistical Computing: An example with the Mercer-Hall wheat yield dataset Version 2.9, 09-Jan-2013. iv+234 pp.

Data Science Toolkit Exploratory Data Analysis and Regression in R Exploratory Data Analysis (EDA) and Regression This tutorial demonstrates some of the capabilities of R for exploring relationships among two (or more) quantitative variables. Bivariate exploratory data analysis We begin by loading the Hipparcos dataset used in the descriptive statistics tutorial, found at hip <- read.table(" header=T,fill=T) names(hip) attach(hip) In the descriptive statistics tutorial, we considered boxplots, a one-dimensional plotting technique. boxplot(Vmag~cut(B.V,breaks=(-1:6)/2), notch=T, varwidth=T, las=1, tcl=.5, xlab=expression("B minus V"), ylab=expression("V magnitude"), main="Can you find the red giants?" The notches in the boxes, produced using "notch=T", can be used to test for differences in the medians (see boxplot.stats for details). Scatterplots plot(Vmag,B.V) plot(Vmag,B.V,pch=".") Let's now use exploratory scatterplots to locate the Hyades stars.

Real time Site Personalization and behavioral targeting solution The R Project for Statistical Computing Social Media and Text Mining Analytics | Social CRM Tools | Collective Intellect Oracle Social Cloud is a cloud service that helps you manage and scale your relationship with customers on social media channels. Oracle has integrated the best-in-class social relationship management (SRM) components - social listening, social engagement, social publishing, social content & apps, and social analytics - into one unified cloud service to give you the most complete SRM solution on the market. Why Oracle? Only Oracle can connect every interaction your customer has with your brand.

R Starter Kit R Starter Kit This page is intended for people who: These materials have been collected from various places on our website and have been ordered so that you can, in step-by-step fashion, develop the skills needed to conduct common analyses in R. Getting familiar with R Class notes: There is no point in waiting to take an introductory class on how to use R. Recommended Books Introducing R Getting familiar with the statistical procedures Textbook examples: We have examples from popular textbooks and worked them out using R. Going further Frequently Asked Questions: We have a list of frequently asked questions (FAQs) regarding R. The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.

Sentiment and Text Analytics: Lexalytics, Inc.