background preloader

R

Facebook Twitter

Axes and Text. Many high level plotting functions (plot, hist, boxplot, etc.) allow you to include axis and text options (as well as other graphical paramters). For example # Specify axis options within plot() plot(x, y, main="title", sub="subtitle", xlab="X-axis label", ylab="y-axix label", xlim=c(xmin, xmax), ylim=c(ymin, ymax)) For finer control or for modularization, you can use the functions described below. Titles Use the title( ) function to add labels to a plot. title(main="main title", sub="sub-title", xlab="x-axis label", ylab="y-axis label") Many other graphical parameters (such as text size, font, rotation, and color) can also be specified in the title( ) function. # Add a red title and a blue subtitle.

Text Annotations Text can be added to graphs using the text( ) and mtext( ) functions. text( ) places text within the graph while mtext( ) places text in one of the four margins. text(location, "text to place", pos, ...) mtext("text to place", side, line=n, ...) Common options are described below. R - Generalized linear Models. 5 Generalized Linear Models Generalized linear models are just as easy to fit in R as ordinary linear model. In fact, they require only an additional parameter to specify the variance and link functions. 5.1 Variance and Link Families The basic tool for fitting generalized linear models is the glm function, which has the folllowing general structure: > glm(formula, family, data, weights, subset, ...) where ... stands for more esoteric options.

The only parameter that we have not encountered before is family, which is a simple way of specifying a choice of variance and link functions. As can be seen, each of the first five choices has an associated variance function (for binomial the binomial variance m(1-m)), and one or more choices of link functions (for binomial the logit, probit or complementary log-log). As long as you want the default link, all you have to specify is the family name. > glm( formula, family=binomial(link=probit)) 5.2 Logistic Regression > attach(cuse) 5.3 Updating Models. How to Deal with Missing Data Values in R. The cor() function in R can deal with missing data values in multiple ways. For that, you set the argument use to one of the possible text values. The value for the use argument is especially important if you calculate the correlations of the variables in a data frame.

By setting this argument to different values, you can Use all observations by setting use='everything'. This means that if there’s any NA value in one of the variables, the resulting correlation is NA as well. This is the default. In fact, you can calculate different measures of correlation. R for beginners and intermediate users 3: plotting with colours. For my third post on my R tutorials for beginners and intermediate users, I shall finally touch on the subject matter that prompted me to start these tutorials - plotting with group structures in colour. If you are familiar with R, then you may have noticed that assigning group structure is not all that straightforward. You can have a dataset that may have a column specifically for group structure such as this: and you'd hope that there is an intuitive and easy way of specifying colour or grouping structure based on this last column.

The short answer is, yes, there is. But before we go into that, let's review simple plotting in R. Let's say we want to plot the first 2 principal components from a principal components analysis (PCA) based on the complete dataset introduced briefly above. Plot(PC1, PC2) and it would give you a default plot that looks like this: The default plot setting is as shown above, with simple open circles. Plot(PC1, PC2, pch=19) family <- as.factor(data[,4]) R devel - Wait for user input with readline() This post has NOT been accepted by the mailing list yet. THIS IS A HOW-TO on HOW TO SCRIPT OR PROGRAM IN THE R-PROGRAMMING LANGUAGE A SIMPLE INPUT AND OUTPUT OF NUMERIC INFORMATION OR DATA. In other words, you are writing a program, which you want to save, and then later run from the console by using some Source("My File") type deal. The program will ask you for numeric information, you will put in numeric data, and assuming you ask it to output (demonstrated), you will get the numeric data output. 1.

Think of a name. BOB <- function() 2. BOB <- function() { 3. BOB <- function() { as.numeric(readline("Please enter a number:>>> ")) 4. 6. 7. 8. 9. 10. 11. AMAZING. Boxplots. Boxplots can be created for individual variables or for variables by group. The format is boxplot(x, data=), where x is a formula and data= denotes the data frame providing the data. An example of a formula is y~group where a separate boxplot for numeric variable y is generated for each value of group.

Add varwidth=TRUE to make boxplot widths proportional to the square root of the samples sizes. Add horizontal=TRUE to reverse the axis orientation. # Boxplot of MPG by Car Cylinders boxplot(mpg~cyl,data=mtcars, main="Car Milage Data", xlab="Number of Cylinders", ylab="Miles Per Gallon") click to view # Notched Boxplot of Tooth Growth Against 2 Crossed Factors # boxes colored for ease of interpretation boxplot(len~supp*dose, data=ToothGrowth, notch=TRUE, col=(c("gold","darkgreen")), main="Tooth Growth", xlab="Suppliment and Dose") In the notched boxplot, if two boxes' notches do not overlap this is ‘strong evidence’ their medians differ (Chambers et al., 1983, p. 62).

Colors recycle. R help archive: Re: [R] Correlation Mapping. I have not seen that book cover but I assume the question is how to plot the cells of a correlation matrix in different colors. Try heatmap or the gplot package function heatmap.2 . For example, we create a correlation matrix, K, from the first 4 columns of the iris data set and create a heatmap using the bluered color scheme: # heatmap.2 library(gplots) K <- cor(iris[,1:4]) heatmap.2(K, col = bluered(16), cexRow = .7, cexCol = .7, symm = TRUE, dend = "row", trace = "none", main = "Iris Data") balloonplot, also in the gplots package, and image in graphics (i.e. core R) might be other functions to look at.

On 7/16/06, justin rapp <jdrapp@gmail.com> wrote: > On the cover of Zivot and Wang's Modeling Financial Time Series with S > Plus, there is a correlation plot that seems to indicate the strength > of correlation with color-coded squares, so that more highly > correlated stocks appear darker red. Received on. Error in lm.fit( Multiple Regression. R provides comprehensive support for multiple linear regression. The topics below are provided in order of increasing complexity.

Fitting the Model # Multiple Linear Regression Example fit <- lm(y ~ x1 + x2 + x3, data=mydata) summary(fit) # show results # Other useful functions coefficients(fit) # model coefficients confint(fit, level=0.95) # CIs for model parameters fitted(fit) # predicted values residuals(fit) # residuals anova(fit) # anova table vcov(fit) # covariance matrix for model parameters influence(fit) # regression diagnostics Diagnostic Plots Diagnostic plots provide checks for heteroscedasticity, normality, and influential observerations. # diagnostic plots layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page plot(fit) click to view For a more comprehensive evaluation of model fit see regression diagnostics.

Comparing Models You can compare nested models with the anova( ) function. Cross Validation You can assess R2 shrinkage via K-fold cross-validation. Variable Selection. Mann-Whitney-Wilcoxon Test. Two data samples are independent if they come from distinct populations and the samples do not affect each other. Using the Mann-Whitney-Wilcoxon Test, we can decide whether the population distributions are identical without assuming them to follow the normal distribution. Example In the data frame column mpg of the data set mtcars, there are gas mileage data of various 1974 U.S. automobiles. > mtcars$mpg [1] 21.0 21.0 22.8 21.4 18.7 ... Meanwhile, another data column in mtcars, named am, indicates the transmission type of the automobile model (0 = automatic, 1 = manual).

> mtcars$am [1] 1 1 1 0 0 0 0 0 ... In particular, the gas mileage data for manual and automatic transmissions are independent. Problem Without assuming the data to have normal distribution, decide at .05 significance level if the gas mileage data of manual and automatic transmissions in mtcars have identical data distribution. Solution Answer. Saving plot to tiff, with high resolution for publication ? A GUI for R - Deducer Manual. An R Graphical User Interface (GUI) for Everyone Deducer is designed to be a free easy to use alternative to proprietary data analysis software such as SPSS, JMP, and Minitab. It has a menu system to do common data manipulation and analysis tasks, and an excel-like spreadsheet in which to view and edit data frames.

The goal of the project is two fold. Provide an intuitive graphical user interface (GUI) for R, encouraging non-technical users to learn and perform analyses without programming getting in their way. Deducer is designed to be used with the Java based R console JGR, though it supports a number of other R environments (e.g. Download and Install Join the Help/Discussion Group Screen shots Video introduction: New Users Existing R users Menu Structure Videos Analysis Visualization Data Import excel data Plug-ins Examples Extension Packages DeducerExtras An add-on package containing a variety of additional analysis dialogs.

DeducerPlugInScaling Reliability and factor analysis DeducerMMR RDSAnalyst. An introductory tour of Deducer. Cor.test for a correlation matrix. Position_stack. had.co.nz. R help archive: Re: [R] Plotting a simple subset. To be fair none of Introduction to R, ? Plot nor the reference card really cover this without substantial digging. # test data x <- 1:10 y <- x*x plot(x[x > 5], y[x > 5]) # or plot(y ~ x, subset = x > 5) # We can have combine conditions like this: plot(y ~ x, subset = x > 5 & y < 50) # also if your intention was really set the plot limits rather than # condition on the data then you can use xlim= and ylim=, e.g. plot(y ~ x, xlim = c(5, max(x))) Read over all of these: ?

On 7/8/05, Berton Gunter <gunter.berton@gene. com> wrote: > Please first read "An Introduction to R" (one of the pdf manuals that ships > with R) before posting these sorts of questions, as it is written > specifically to help you get started (I think fairly clearly). > > Other (links to) learning resources may be found on the CRAN website. R-help@stat. math.ethz.ch mailing list PLEASE do read the posting guide! Received on. Akaike's An Information Criterion. Description Generic function calculating Akaike's ‘An Information Criterion’ for one or several fitted model objects for which a log-likelihood value can be obtained, according to the formula -2*log-likelihood + k*npar, where npar represents the number of parameters in the fitted model, and k = 2 for the usual AIC, or k = log(n) (n being the number of observations) for the so-called BIC or SBC (Schwarz's Bayesian criterion).

Usage AIC(object, ..., k = 2) BIC(object, ...) Arguments Details When comparing models fitted by maximum likelihood to the same data, the smaller the AIC or BIC, the better the fit. The theory of AIC requires that the log-likelihood has been maximized: whereas AIC can be computed for models not fitted by maximum likelihood, their AIC values should not be compared. Examples of models not ‘fitted to the same data’ are where the response is transformed (accelerated-life models are fitted to log-times) and where contingency tables have been used to summarize data.

Value See Also. Error message: " The following object(s) are masked" Input and Output. R] remove multiple columns by name from dataframe. PCA on Correlation or Covariance? - Statistical Analysis - Stack Exchange. Plot 2 graphs in same plot in R. Vocabulary - GitHub. Getting rid of axis values in R Plot. R: Pairwise linkage disequilibrium between genetic markers.

LD {genetics} R Documentation Description Compute pairwise linkage disequilibrium between genetic markers Usage LD(g1, ...) ## S3 method for class 'genotype': LD(g1,g2,...) ## S3 method for class 'data.frame': LD(g1,...) Arguments g1 genotype object or dataframe containing genotype objects g2 genotype object (ignored if g1 is a dataframe) optional arguments (ignored) Details Linkage disequilibrium (LD) is the non-random association of marker alleles and can arise from marker proximity or from selection bias. LD.genotype estimates the extent of LD for a single pair of genotypes. Three estimators of LD are computed: raw difference in frequency between the observed number of AB pairs and the expected number: scaled D spanning the range [-1,1] where, if D > 0: or if D < 0: r correlation coefficient between the markers where Value LD.genotype returns a 5 element list: call the matched call Linkage disequilibrium estimate Dprime Scaled linkage disequilibrium estimate corr Correlation coefficient nobs Number of observations.

Current - Producing Simple Graphs with R.