This is the in-progress book site for "Advanced R development". The book is designed primarily for R users who want to improve their programming skills and understanding of the language. It should also be useful for programmers coming to R from other languages, as it explains some of R's quirks and shows how some parts that seem horrible do have a positive side. It will eventually be published as a real book in Chapman and Hall's R series. The final version of the book is due in June 2014, so it should be available in late 2014. Thanks to the publisher, the wiki will continue to be freely available after the book is published.
git/github guideAll statistical/computational scientists should use git and github, but it can be hard to get started. I hope these pages help. (More blather below.) There are many resources for git and github; my aim is to provide the minimal guide to get started. I love git and github.Mining of Massive DatasetsThe book has now been published by Cambridge University Press. The publisher is offering a 20% discount to anyone who buys the hardcopy Here. By agreement with the publisher, you can still download it free from this page.Building an R Hadoop System - RDataMining.com: R and Data MiningThis page shows how to build an R Hadoop system, and presents the steps to set up my first R Hadoop system in single-node mode on Mac OS X. After reading documents and tutorials on MapReduce and Hadoop and playing with RHadoop for about 2 weeks, finally I have built my first R Hadoop system and successfully run some R examples on it. Here I’d like to share my experience and steps to achieve that. Hopefully it will make it easier to try RHadoop for R users who are new to Hadoop. Note that I tried this on Mac only and some steps might be different for Windows. Before going through the complex steps below, let’s have a look what you can get, to give you a motivation to continue.
Handling Large Datasets In RHandling large dataset in R, especially CSV data, was briefly discussed before at Excellent free CSV splitter and Handling Large CSV Files in R. My file at that time was around 2GB with 30 million number of rows and 8 columns. Recently I started to collect and analyze US corporate bonds tick data from year 2002 to 2010, and the CSV file I got is 6.18GB with 40 million number of rows, even after removing biases data as in Biases in TRACE Corporate Bond Data. How to proceed efficiently? Below is an excellent presentation on handling large datasets in R by Ryan Rosario at a short summary of the presentation:1, R has a few packages for big data support.All about the position: Data scientistTeradata Aster is seeking experienced individuals with demonstrated capability in the applied analytic and/or data science space. Proficiency in data manipulation, analytic algorithms, advanced math, and/or statistical modeling is required and application development experience a plus. We are looking for exceptional individuals to join our Professional Services team as an Analytic Data Scientists. This client-facing role will be engaged in the design and deployment of solutions.
An R "meta" bookby Joseph Rickert I am a book person. I collect books on all sorts of subjects that interest me and consequently I have a fairly extensive collection of R books, many of which I find to be of great value. Nevertheless, when I am asked to recommend an R book to someone new to R I am usually flummoxed. R is growing at a fantastic rate, and people coming to R for the first time span I wide range of sophistication. And besides, owning a book is kind of personal.RStudio Server Amazon Machine Image (AMI) - Louis AslettCurrent AMI Quick Reference (27nd Jun 2015)Amazon instance type reference Click to launch through AWS web interface: What’s new recently? Easy Dropbox setup to make syncing files on/off server easy, including selective folder sync. Preinstalled RStudioAMI R package for server control.
New release: Choroplethr v3.2.0 - AriLamstein.comToday I am happy to announce that a new version of choroplethr, v3.2.0, is now available. You can get it by typing the following from an R console: install.packages("choroplethr")Data Science Bootcamp - 12 week career prepNew York City in-person instruction + ongoing career coaching + job placement support Winter Bootcamp: January 12, 2015 - April 3, 2015 Application Period ClosedONLINE OPEN-ACCESS TEXTBOOKSSearch form You are here Forecasting: principles and practice Rob J Hyndman George Athanasopoulos Statistical foundations of machine learning
Highland Statistics LtdJump straight to Price and Order the book Outline Keywords Table of Contents Data sets and R code usedBlog - AriLamstein.comToday’s guest post is by Julia Silge. After reading Julia’s analysis of religions in America (“This is the Place, Apparently“) I invited her to teach my readers how to map information about US Religious Adherence by County in R. Julia can be found blogging here or on Twitter. I took Ari’s free email course for getting started with the choroplethr package last year, and I have so enjoyed making choropleth maps and using them to explore demographic data. Earlier this month, I posted a project on my blog exploring the religious demographics of my adopted home state of Utah that made heavy use of the choroplethr package and today I’m happy to share some of the details of the data set I used here on Ari’s blog and do some new analysis.
The Analytics EdgeIn the last decade, the amount of data available to organizations has reached unprecedented levels. Data is transforming business, social interactions, and the future of our society. In this course, you will learn how to use data and analytics to give an edge to your career and your life. We will examine real world examples of how analytics have been used to significantly improve a business or industry.