CRAN Task View: Design of Experiments (DoE) & Analysis of Experimental Data This task view collects information on R packages for experimental design and analysis of data from experiments. With a strong increase in the number of relevant packages, packages that focus on analysis only and do not make relevant contributions for design creation are no longer added to this task view. Please feel free to suggest enhancements, and please send information on new packages or major package updates if you think they belong here. Contact details are given on my Web page . Experimental design is applied in many areas, and methods have been tailored to the needs of various fields.
Mann-Whitney-Wilcoxon Test Two data samples are independent if they come from distinct populations and the samples do not affect each other. Using the Mann-Whitney-Wilcoxon Test, we can decide whether the population distributions are identical without assuming them to follow the normal distribution. Example In the data frame column mpg of the data set mtcars, there are gas mileage data of various 1974 U.S. automobiles. > mtcars$mpg  21.0 21.0 22.8 21.4 18.7 ...
AMSC689: Research Interaction Team in Penalized Regression (Fall 2013) Description Major advances in technology for genomic studies are bringing the prospect of personalized and individualized medicine closer to reality. Many of these advances are predicated on the ability to generate data at an unprecedented rate, posing a significant need for computational data analysis that is clinically and biologically useful and robust. This reading course will concentrate on the fundamental statistical methods required to meet this need. The goal is to be familiar with the key articles and concepts in the field of penalized regression, as well as have a sense of the unanswered questions and current research directions. No prior knowledge of biology is required.
Using R for Time Series Analysis — Time Series 0.2 documentation Reading Time Series Data The first thing that you will want to do to analyse your time series data will be to read it into R, and to plot the time series. You can read data into R using the scan() function, which assumes that your data for successive time points is in a simple text file with one column. For example, the file contains data on the age of death of successive kings of England, starting with William the Conqueror (original source: Hipel and Mcleod, 1994). The data set looks like this: Age of Death of Successive Kings of England #starting with William the Conqueror#Source: McNeill, "Interactive Data Analysis"604367505642506568436534...
Forecasting within limits Forecasting within limits It is common to want forecasts to be positive, or to require them to be within some specified range . Both of these situations are relatively easy to handle using transformations. Positive forecasts To impose a positivity constraint, simply work on the log scale. First steps with Non-Linear Regression in R Drawing a line through a cloud of point (ie doing a linear regression) is the most basic analysis one may do. It is sometime fitting well to the data, but in some (many) situations, the relationships between variables are not linear. In this case one may follow three different ways: (i) try to linearize the relationship by transforming the data, (ii) fit polynomial or complex spline models to the data or (iii) fit non-linear functions to the data. As you may have guessed from the title, this post will be dedicated to the third option. What is non-linear regression?
Data Mining With R: TIME SERIES using R FITTING ARIMA MODEL in RAuto Regressive Integrated Moving Average ( ARIMA) model is generalisation of Auto Regressive Moving Average Model (ARMA) and used to predict future points in Time series.But what is time series , Acc to google "A time series is a sequence of data points, typically consisting of successive measurements made over a time interval. Examples of time series are ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average." and more in a general way it is a series of values of a quantity obtained at successive times, often with equal intervals between them.ARIMA models are defined for stationary time series , stationary time series is one whose mean, variance, autocorrelation are all constant over time. But for checking that the time series is stationary or not , we have several statistical tests for them namely. 1) For the Box.test, if p-value < 0.05 => stationary 2) For the adf.test, if p-value < 0.05 => stationary
Introduction to Statistical Learning An Introduction to Statistical Learning with Applications in R Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani This book provides an introduction to statistical learning methods. It is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences. AboutHydrology: R resources for Hydrologists R is my statistical software of election. I had hard time to convince my Ph.D. students to adopt it, but finally they did, and, as usually happens, many of them became more proficient than me in the field. Now it seems natural to use it for everything, but this was not always the case. A list of introductory material is here.
Interpreting noise When watching the TV news, or reading newspaper commentary, I am frequently amazed at the attempts people make to interpret random noise. For example, the latest tiny fluctuation in the share price of a major company is attributed to the CEO being ill. When the exchange rate goes up, the TV finance commentator confidently announces that it is a reaction to Chinese building contracts. No one ever says “The unemployment rate has dropped by 0.1% for no apparent reason.” What is going on here is that the commentators are assuming we live in a noise-free world. They imagine that everything is explicable, you just have to find the explanation.
Express Intro to dplyr Working The Data Like a Boss ! I recently introduced the data.table package which provides a nice way to manage and aggregate large data sources using the standard bracket notation that is commonly employed when manipulating data frames in R. As data sources grow larger one must be prepared with a variety of approaches to efficiently handle this information. Using databases (both SQL and NoSQL) are a possibility wherein one queries for a subset of information although this assumes that the database is pre-existing or that you are prepared to create it yourself. The dplyr package offers ways to read in large files, interact with databases, and accomplish aggregation and summary. Some feel that dplyr is a competitor to the data.table package though I do not share that view.
Percentile The nth percentile of an observation variable is the value that cuts off the first n percent of the data values when it is sorted in ascending order. Problem Find the 32nd, 57th and 98th percentiles of the eruption durations in the data set faithful. Solution We apply the quantile function to compute the percentiles of eruptions with the desired percentage ratios. Random forests - classification description Contents Introduction Overview Features of random forests Remarks How Random Forests work The oob error estimate Variable importance Gini importance Interactions Proximities Scaling Prototypes Missing values for the training set Missing values for the test set Mislabeled cases Outliers Unsupervised learning Balancing prediction error Detecting novelties A case study - microarray data Classification mode Variable importance Using important variables Variable interactions Scaling the data Prototypes Outliers A case study - dna data Missing values in the training set Missing values in the test set Mislabeled cases Case Studies for unsupervised learning Clustering microarray data Clustering dna data Clustering glass data Clustering spectral data References Introduction This section gives a brief overview of random forests and some comments about the features of the method. Overview We assume that the user knows about the construction of single classification trees.