background preloader

R Programming - Manuals

R Programming - Manuals
R Basics The R & BioConductor manual provides a general introduction to the usage of the R environment and its basic command syntax. Code Editors for R Several excellent code editors are available that provide functionalities like R syntax highlighting, auto code indenting and utilities to send code/functions to the R console. Programming in R using Vim or Emacs Programming in R using RStudio Integrating R with Vim and Tmux Users interested in integrating R with vim and tmux may want to consult the Vim-R-Tmux configuration page. Finding Help Reference list on R programming (selection)R Programming for Bioinformatics, by Robert GentlemanAdvanced R, by Hadley WickhamS Programming, by W. Control Structures Conditional Executions Comparison Operators equal: ==not equal: ! Logical Operators If Statements If statements operate on length-one logical vectors. Syntax if(cond1=true) { cmd1 } else { cmd2 } Example if(1==0) { print(1) } else { print(2) } [1] 2 Avoid inserting newlines between '} else'. Loops Syntax Related:  R

Rtips. Revival 2014! Paul E. Johnson <pauljohn @ ku.edu> The original Rtips started in 1999. It became difficult to update because of limitations in the software with which it was created. Now I know more about R, and have decided to wade in again. In January, 2012, I took the FaqManager HTML output and converted it to LaTeX with the excellent open source program pandoc, and from there I’ve been editing and updating it in LyX. You are reading the New Thing! The first chore is to cut out the old useless stuff that was no good to start with, correct mistakes in translation (the quotation mark translations are particularly dangerous, but also there is trouble with ~, $, and -. (I thought it was cute to call this “StatsRus” but the Toystore’s lawyer called and, well, you know…) If you need a tip sheet for R, here it is. This is not a substitute for R documentation, just a list of things I had trouble remembering when switching from SAS to R. Heed the words of Brian D. 1.1 Bring raw numbers into R (05/22/2012) Step 1.

60+ R resources to improve your data skills This list was originally published as part of the Computerworld Beginner's Guide to R but has since been expanded to also include resources for advanced beginner and intermediate users. If you're just starting out with R, I recommend first heading to the Beginner's Guide. These websites, videos, blogs, social media/communities, software and books/ebooks can help you do more with R; my favorites are listed in bold. Want to see a sortable list of resources by subject and type? Expand the chart below. Books and e-books R Cookbook. R Graphics Cookbook. R in Action: Data analysis and graphics with R. The Art of R Programming. R in a Nutshell. Visualize This. R For Dummies. Introduction to Data Science. R for Everyone. Statistical Analysis With R: Beginner's Guide. Reproducible Research with R and RStudio. Exploring Everyday Things with R and Ruby. Online references 4 data wrangling tasks in R for advanced beginners. Data manipulation tricks: Even better in R. Cookbook for R. Quick-R. Videos

Subsetting · Advanced R. R’s subsetting operators are powerful and fast. Mastery of subsetting allows you to succinctly express complex operations in a way that few other languages can match. Subsetting is hard to learn because you need to master a number of interrelated concepts: The three subsetting operators.The six types of subsetting.Important differences in behaviour for different objects (e.g., vectors, lists, factors, matrices, and data frames).The use of subsetting in conjunction with assignment. This chapter helps you master subsetting by starting with the simplest type of subsetting: subsetting an atomic vector with [. It then gradually extends your knowledge, first to more complicated data types (like arrays and lists), and then to the other subsetting operators, [[ and $. Subsetting is a natural complement to str(). str() shows you the structure of any object, and subsetting allows you to pull out the pieces that you’re interested in. Quiz Outline Data types starts by teaching you about [. Data types !

Medley: a new R package for blending regression models Hi Sashi, Sorry for the muddled explaination. What I was trying to say is, if you give Martin's medley package a tuning grid, it will fit a model to each parameter set in the grid, and then include ALL the models in the final ensemble. For example, lets say you fit a random forest model with an mtry of 2, 4, and 8, and a knn model with k of 10, 15, and 20. If you wanted to include all 6 models in the ensemble, you would need to separetly fit 6 caret models for mtry=2, mtry=4, mtry=8, and k=10, k=15, and k=20. Does this make sense? -Zach Plot maps like a boss A new package OpenStreetMap has been released to CRAN this week which is designed to allow you to easily add satellite imagery, or open street maps to your plots. Raster maps are a great way to add context to your spatial data with a minimum outlay of effort. The syntax in OpenStreetMap is fairly simple, just give it a bounding box in lat/long and it will download a high quality raster image ready for plotting library(OpenStreetMap) library(rgdal) map <- openmap(c(70,-179), c(-70,179)) plot(map) (click for higher quality image) The above code downloads multiple map tiles and stitches together, with the level of zoom determined automatically. We can also access satellite imagery though Bing. map <- openmap(c(70,-179), c(-70,179),type='bing') plot(map) Now, that is all fine and dandy, but kind of useless unless you are able to combine it with your own data. In terms of combining maps with your data there are two options. We may also want to go the other way and transform the image.

Onepager Now with knitR \documentclass[nohyper,justified]{tufte-handout} %\documentclass{article} %\usepackage[absolute,showboxes]{textpos} \usepackage[absolute]{textpos} \usepackage{sidecap} %\usepackage{color} %\usepackage[usenames,dvipsnames,svgnames,table]{xcolor} \begin{document} <<include=FALSE>>= opts_chunk$set(concordance=TRUE) \begin{wide} \section{\Huge Performance Summary with knitR and R} {\Large Here is a little experiment with R and Sweave to produce a performance report. \hrulefill \end{wide} <<eval=TRUE,echo=FALSE,results='hide',warning=FALSE,message=FALSE,error=FALSE>>= #do requires and set up environment for reporting require(xtable) require(ggplot2) require(directlabels) require(reshape2) require(latticeExtra) require(quantmod) require(PerformanceAnalytics) data(managers) #get xts in df form so that we can melt with the reshape package #will use just manager 1, sp500, and 10y treasury managers <- managers[,c(1,8,9)] #add 0 at beginning so cumulative returns start at 1 managers.melt <- melt(managers.df,id.vars=1)

R tells you where weapons go As an ameturer programmer (one without proper trainings in any mainstream programming language — C and Java) , the more I use R the more I understand the saying — “You are only bounded by your imagination”. The other day I suddenly recalled that someone did a very impressive Facebook map. I then thought it would be nice if I can put these “flows” on the map (or of the same sort) created in my first post. So, I googled around and found this brilliant blog that teaches you how to make flows (Great circles) step by step. Again, thanks to R, its great community and its openness, I created the following map of international weapon export in 2010 (from top 7 exporters). R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

R FAQ Frequently Asked Questions on R Version 3.1.2014-04-05 Table of Contents 1 Introduction This document contains answers to some of the most frequently asked questions about R. 1.1 Legalese This document is copyright © 1998–2014 by Kurt Hornik. This document is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Copies of the GNU General Public License versions are available at 1.2 Obtaining this document The latest version of this document is always available from From there, you can obtain versions converted to plain ASCII text, GNU info, HTML, PDF, as well as the Texinfo source used for creating all these formats using the GNU Texinfo system. 1.4 Notation 2 R Basics

Mapping the World’s Biggest Airlines The map above shows the routes flown by the top 7 airlines (by international passenger distance flown). The base map shows large urban areas and I have attempted to make it look a bit like the beautiful “Earth at Night” composite image produced by NASA. You can clearly see a relationship between where people live and where the big carriers fly to across Europe and the US but India and much of China have relatively few routes. I expect much of the slack is picked up by smaller airlines in these countries but they must represent key growth areas the world economy becomes increasingly driven by the east. This map isn’t meant to be comprehensive- I just wanted to make another example of a visualisation with ggplot2. How I did it Plotting great circles has become an increasingly popular thing to do with R (because they look cool) and the excellent flight path data freely available from the OpenFlights website provides a neat data source to play around with. Get a world map worldmap

Rで計量時系列分析:はじめに覚えておきたいこと - 銀座で働くデータサイエンティストのブログ 機械学習は全然専門ではない僕が知ったかぶりをするのも何なので*1、もっともっと以前からそこそこやっている*2計量時系列分析の話でもしてお茶を濁してみることにします(笑)。 もうしつこ過ぎて自分でも嫌になってきたんですが(笑)、このシリーズでベースにするテキストは以下の2冊。沖本テキストとHamiltonテキストです*3。 そもそも何故時系列データをモデリングするのか? これって結構忘れられやすいポイントなんですが、要は「時間とともに変化するデータがあって、もしそれが何らかの『内部構造』を持っていれば、何がしかの形で定式化できるはずだ」という前提もしくは仮説がある、ということです。 様々な時系列データは、一見するとどれもランダムに上下動しているように見えます。 もし、それらが本当にランダムなのであれば、そもそも分析する意味はほとんどないわけです。 でも、例えば株価であればその根底に投資家たち同士の「景況感」「駆け引き」「雰囲気」といった要因がありますし、ソシャゲのDAUや課金UU数であればユーザーの「満足度」「イベントへの誘引」といった要因がありますし、オンライン広告のCVRであれば「顧客満足度」「商品訴求の強弱」といった要因があります。 そういった、隠れた内部構造を含めて「時系列データが持っている多様な特徴を記述できるモデルを構築する」こと(沖本本p.4)、「変数間の動学的関係を明らかにする」こと(沖本本p.4:要するに異なる時系列同士で時間変化していく相関・因果・依存関係などを導き出すこと)、そして「それらの時系列モデルに基づいて様々な経済学的・ビジネス的な仮説や理論を検証する」ことこそが、時系列分析の目的だと言って良いでしょう。 計量時系列分析で使うRパッケージ 大体のまとめがCRAN Task View: Time Series Analysisに載っているので、面倒な人は先にそちらを読んでください(笑)。 その他にも{arfima}, {fracdiff}, {dlm}, {timsac}, {MSBVAR}なども試したことはありますが、まだ実戦投入していませんのでここでは割愛します。 基礎知識:時系列データの種類、定常過程、ホワイトノイズ、自己相関etc. はっきり言って沖本本からの丸写し*4で恐縮なんですが(笑)、大事な話なので一応書いていきます。 時系列データの種類 定常過程

Coming of Age: R and Spatial Data Visualisation I have been using R (a free statistics and graphics software package) now for the past four years or so and I have seen it become an increasingly powerful method of both analysing and visualising spatial data. Crucially, more and more people are writing accessible tutorials (see here) for beginners and intermediate users and the development of packages such as ggplot2 have made it simpler than ever to produce fantastic graphics. You don’t get the interactivity you would with conventional GIS software such as ArcGIS when you produce the visualisation but you are much more flexible in terms of the combinations of plot types and the ease with which they can be combined. It is, for example, time consuming to produce multivariate symbols (such as those varying in size and colour) in ArcGIS but with R it is as simple* as one line of code. I have, for example, been able to add subtle transitions in the lines of the migration map above.

Related: