Personal R Packages | Not So Standard Deviations I came across this R package on GitHub, and it made me so excited that I decided to write a post about it. It’s a compilation by Karl Broman of various R functions that he’s found helpful to write throughout the years. Wouldn’t it be great if incoming graduate students in Biostatistics/Statistics were taught to create a personal repository of functions like this? Not only is it a great way to learn how to write an R package, but it also encourages good coding techniques for newer students (since it encourages them to write separate functions with documentation). It also allows for easy reprodicibility and collaboration both within the school and with the broader community. (Note that install_github is a function in the devtools package. For whatever reason, when I think of R packages, I think of big, unified projects with a specified scientific aim. Like this: Like Loading...
Principal Component Analysis A simple example Consider 100 students with Physics and Statistics grades shown in the diagram below. The data set is in marks.dat. If we want to compare among the students which grade should be a better discriminating factor? Physics or Statistics? Surely Physics, since the variation is larger there. Here the direction of maximum variation is like a slanted straight line. dat = read.table("marks.dat",head=T) dim(dat) names(dat) pc = princomp(~Stat+Phys,dat) pc$loading Notice the somewhat non-intuitive syntax of the princomp function. R has returned two principal components. pc to learn the amount of spread of the data along the chosen directions. names(pc) We shall not go into all these here. pc$scores Higher dimensions Most statisticians consider PCA a tool for reducing dimension of data. The same conclusion may be obtained by PCA. Putting it to action quas = read.table("SDSS_quasar.dat",head=T) dim(quas) names(quas) quas = na.omit(quas) dim(quas) Now we shall apply PCA. to see them. plot(pc)
D G Rossiter - Publications & Computer Programs Rossiter, DG 2012. Introduction to the R Project for Statistical Computing for use at ITC 14-Aug-2012, v + 136 pp. (First version 2003) On-line, version 4.0 (3 Mb) Rossiter, DG 2014. Tutorial: Using the R Environment for Statistical Computing: An example with the Mercer-Hall wheat yield dataset Version 2.9, 09-Jan-2013. iv+234 pp.
Exploratory Data Analysis and Regression in R Exploratory Data Analysis (EDA) and Regression This tutorial demonstrates some of the capabilities of R for exploring relationships among two (or more) quantitative variables. Bivariate exploratory data analysis We begin by loading the Hipparcos dataset used in the descriptive statistics tutorial, found at hip <- read.table(" header=T,fill=T) names(hip) attach(hip) In the descriptive statistics tutorial, we considered boxplots, a one-dimensional plotting technique. boxplot(Vmag~cut(B.V,breaks=(-1:6)/2), notch=T, varwidth=T, las=1, tcl=.5, xlab=expression("B minus V"), ylab=expression("V magnitude"), main="Can you find the red giants?" The notches in the boxes, produced using "notch=T", can be used to test for differences in the medians (see boxplot.stats for details). Scatterplots plot(Vmag,B.V) plot(Vmag,B.V,pch=".") Let's now use exploratory scatterplots to locate the Hyades stars.
The R Project for Statistical Computing R Starter Kit R Starter Kit This page is intended for people who: These materials have been collected from various places on our website and have been ordered so that you can, in step-by-step fashion, develop the skills needed to conduct common analyses in R. Getting familiar with R Class notes: There is no point in waiting to take an introductory class on how to use R. Recommended Books Introducing R Getting familiar with the statistical procedures Textbook examples: We have examples from popular textbooks and worked them out using R. Going further Frequently Asked Questions: We have a list of frequently asked questions (FAQs) regarding R. The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.
R Programming Welcome to the R programming Wikibook This book is designed to be a practical guide to the R programming language. R is free software designed for statistical computing. How can you share your R experience ? Explain the syntax of a commandCompare the different ways of performing each task using R.Try to make unique examples based on fake data (ie simulated data sets).As with any Wikibook please feel free to make corrections, expand explanations, and make additions where necessary. Some rules : Prerequisites We assume that readers have a background in statistics. We also assume that readers are familiar with computers and that they know how to use software with a command-line interface. See also Larry Wasserman's book All of StatisticsThe Statistics and the Econometric Theory wikibooks.The Econometrics and Statistics pages on wikipedia. References
Home Page Machine Learning Repository Time Series Analysis In the following topics, we will first review techniques used to identify patterns in time series data (such as smoothing and curve fitting techniques and autocorrelations), then we will introduce a general class of models that can be used to represent time series data and generate predictions (autoregressive and moving average models). Finally, we will review some simple but commonly used modeling and forecasting techniques based on linear regression. For more information see the topics below. General Introduction In the following topics, we will review techniques that are useful for analyzing time series data, that is, sequences of measurements that follow non-random orders. Detailed discussions of the methods described in this section can be found in Anderson (1976), Box and Jenkins (1976), Kendall (1984), Kendall and Ord (1990), Montgomery, Johnson, and Gardiner (1990), Pankratz (1983), Shumway (1988), Vandaele (1983), Walker (1991), and Wei (1989). Two Main Goals Trend Analysis Where:
Applied Time Series Analysis [Home] [Lectures] [Assignments] [Exams] Introduction Model-based forecasting methods; autoregressive and moving average models; ARIMA, ARMAX, ARCH, and state-space models; estimation, forecasting and model validation; missing data; irregularly spaced time series; parametric and nonparametric bootstrap methods for time series; multiresolution analysis of spatial and time-series signals; and time-varying models and wavelets. [From: Course Description - Statistics Department, Rutgers University] Course Outline Part I: Basic Concept of Time Series Introduction, Regression Model vs. Textbook: Analysis of Financial Time Series, by Ruey S. Lectures Assignments Midterm [Sample Midterm Exam (2008)] [Solution] [Review] Final Exam [Sample Final Exam (2008)] [Solution] [Review] Past and Future Guang Yang, Ph.D. candidate, Department of Statistics, Hill Center Rm 557, Rutgers University.
Quick-R: Home Page