background preloader

R Resources

Facebook Twitter

Create smooth animations in R with the tweenr package. There are several tools available in R for creating animations (movies) from statistical graphics.

Create smooth animations in R with the tweenr package

The animation package by Yihui Xie will create an animated GIF or video file, using a series of R charts you generate as the frames. And the gganimate package by David Robinson is an extension to ggplot2 that will create a movie from charts created using the ggplot2 syntax: in much the same way that you can create multiple panels using faceting, you can instead create an animation with multiple frames. But from a storytelling perspective, such animations can sometimes seem rather disjointed.

For example, here's the example (from the gganimate documentation) of crating an animated bubble chart from the gapminder data. (NOTE: to use the gganimate package, you will need to install ImageMagick. The data are all there, but the presentation lacks the style of the original Gapminder presentation, which smoothly moved the bubbles between the known data points separated by five-year intervals. Coblis — Color Blindness Simulator.

If you are not suffering from a color vision deficiency it is very hard to imagine how it looks like to be colorblind.

Coblis — Color Blindness Simulator

The Color BLIndness Simulator can close this gap for you. Just play around with it and get a felling of how it is to have a color vision handicap. As all the calculations are made on your local machine, no images are uploaded to the server. Therefore you can use images as big as you like, there are no restrictions. The viridis color palettes. Unsupervised Learning and Text Mining of Emotion Terms Using R. Unsupervised learning refers to data science approaches that involve learning without a prior knowledge about the classification of sample data.

Unsupervised Learning and Text Mining of Emotion Terms Using R

In Wikipedia, unsupervised learning has been described as “the task of inferring a function to describe hidden structure from ‘unlabeled’ data (a classification of categorization is not included in the observations)”. The overarching objectives of this post were to evaluate and understand the co-occurrence and/or co-expression of emotion words in individual letters, and if there were any differential expression profiles /patterns of emotions words among the 40 annual shareholder letters?

Differential expression of emotion words was being used to refer to quantitative differences in emotion word frequency counts among letters, as well as qualitative differences in certain emotion words occurring uniquely in some letters but not present in others. The dataset Analysis of emotions terms usage. NiceOverPlot, or when the number of dimensions does matter.

Hi there!

niceOverPlot, or when the number of dimensions does matter

Over the last few months, my lab-mate Irene Villa (see more of her work here!) And I, have been discussing ecological niche overlap. The niche concept dates back to ideas first proposed by ornithologist J. Grinnell (1917). Later on, G.E. Timekit: Time Series Forecast Applications Using Data Mining. The timekit package contains a collection of tools for working with time series in R.

timekit: Time Series Forecast Applications Using Data Mining

There’s a number of benefits. One of the biggest is the ability to use a time series signature to predict future values (forecast) through data mining techniques. While this post is geared toward exposing the user to the timekit package, there are examples showing the power of data mining a time series as well as how to work with time series in general. A number of timekit functions will be discussed and implemented in the post. The first group of functions works with the time series index, and these include functions tk_index(), tk_get_timeseries_signature(), tk_augment_timeseries_signature() and tk_get_timeseries_summary(). R for Data Science. Happy Git and GitHub for the useR. Sign Up. Test driving Python integration in R, using the ‘reticulate’ package. Introduction Not so long ago RStudio released the R package ‘reticulate‘, it is an R interface to Python.

Test driving Python integration in R, using the ‘reticulate’ package

Of course, it was already possible to execute python scripts from within R, but this integration takes it one step further. Imported Python modules, classes and functions can be called inside an R session as if it were just native R functions. Below you’ll find some screen shot code snippets of using certain Python modules within R with the reticulate package. On my GitHub page you’ll find the R files from which these snippets were taken from. How to store and use webservice keys and authentication details with R.

By Andrie de Vries (@RevoAndrie) I frequently get asked the question how you can safely store login details and passwords for use by R, without exposing these details in your script.

How to store and use webservice keys and authentication details with R

Yesterday Jennifer Bryan asked this question on twitter and a small storm of views and tweets erupted. A few minutes later she tweeted that there clearly is no consensus: Different options Reading the twitter conversation, it seems to me there are several approaches. Directly inside your script.In a file in your project folder, that you don't share.In a .Rprofile fileIn a .REnviron fileStore the keys in a json fileIn a secure key store that you access from R Let's look at the key idea and benefits (or disadvantages) of each approach: 1.

The first approach is to simply store your keys directly in your script. id <- "my login name" pw <- "my password" call_service(id, pw, ...) 2. The second option is almost just as easy to do. Best practices for writing an API package. So you want to write an R client for a web API?

Best practices for writing an API package

This document walks through the key issues involved in writing API wrappers in R. If you’re new to working with web APIs, you may want to start by reading “An introduction to APIs” by zapier. Overall design APIs vary widely. Before starting to code, it is important to understand how the API you are working with handles important issues so that you can implement a complete and coherent R client for the API. The key features of any API are the structure of the requests and the structure of the responses.

HTTP verb (GET, POST, DELETE, etc.)The base URL for the APIThe URL path or endpointURL query arguments (e.g., ? An API package needs to be able to generate these components in order to perform the desired API call, which will typically involve some sort of authentication. For example, to request that the GitHub API provides a list of all issues for the httr repo, we send an HTTP request that looks like: Setting your working directory permanently in R. Fitting a rational function in R using ordinary least-squares regression. By Srini Kumar, VP of Product Management and Data Science, LevaData; and Bob Horton, Senior Data Scientist, Microsoft A rational function is defined as the ratio of two functions.

Fitting a rational function in R using ordinary least-squares regression

Take your data frames to the next level. UK government using R to modernize reporting of official statistics. Like all governments, the UK government is responsible for producing reports of official statistics on an ongoing basis.

UK government using R to modernize reporting of official statistics

That process has traditionally been a highly manual one: extract data from government systems, load it into a mainframe statistical analysis tool and run models and forecasts, extract the results to a spreadsheet to prepare data for presentation, and ultimately combine it all in a manual document editing tool to produce the final report. The process in the UK looks much like this today: Matt Upson, a Data Scientist at the UK Government Digital Service, is looking to modernize this process with a reproducible analytical pipeline.

This new process, based on the UK Government's Technology Service Manual for new IT deployments, aims to simplify the process by using R — the open-source programming language for statistical analysis — to automate the data extraction, analysis, and document generation tasks. The one thing you need to master data science. When you ask people what makes a person great – what makes someone an elite performer – they commonly say “talent.” Most people believe that elite performers are born with their talent. Most people believe that top performers come into the world with an innate talent that makes them special. You see something like this in data science too. People hear about elite data scientists and they assume that these people are just naturally gifted.

Selecting columns and renaming are so easy with dplyr. Talking about just selecting columns sounds boring, except it’s not with dplyr. I’m not going to try to convince you why it’s not, rather let’s start taking a look by doing. We’ll use the same flight data we have imported last time. Why I love R Notebooks. By Nathan Stephens Note: R Notebooks requires RStudio Version 1.0 or later I’m a big fan of the R console. During my early years with R, that’s all I had, so I got very comfortable with pasting my code into the console. Since then I’ve used many code editors for R, but they all followed the same paradigm – script in one window and get output in another window.

Lesser known dplyr tricks – R-bloggers. In this blog post I share some lesser-known (at least I believe they are) tricks that use mainly functions from dplyr. Removing unneeded columns. R Markdown: How to format tables and figures in .docx files. In research, we usually publish the most important findings in tables and figures. When writing research papers using Rmarkdown (*.Rmd), we have several options to format the output of the final MS Word document (.docx).

Tables can be formated using either the knitr package’s kable() function or several functions of the pander package. Figure sizes can be determined in the chunk options, e.g. {r name_of_chunk, fig.height=8, fig.width=12}. R Markdown: How to number and reference tables. R Markdown is a great tool to make research results reproducible. However, in scientific research papers or reports, tables and figures usually need to be numbered and referenced. Unfortunately, R Markdown has no “native” method to number and reference table and figure captions. The recently published bookdown package makes it very easy to number and reference tables and figures (Link). However, since bookdown uses LaTex functionality, R Markdown files created with bookdown cannot be converted into MS Word (.docx) files. Using knitr and pandoc to create reproducible scientific reports. Analytical and Numerical Solutions to Linear Regression Problems. This exercise focuses on linear regression with both analytical (normal equation) and numerical (gradient descent) methods.

We will start with linear regression with one variable. From this part of the exercise, we will create plots that help to visualize how gradient descent gets the coefficient of the predictor and the intercept. In the second part, we will implement linear regression with multiple variables. Maximize manufacturing profit. Optimize! From Descriptive to Prescriptive Analytics. Predominately data science projects deal with descriptive statistics. The common theme (especially on this blog) is to gather a dataset, visualize and describe it. The toolset consists of a combination of machine learning, descriptive statistics and (gg-)plots. RSQLite: Write a local data frame or file to the database. R and SQLite: Part 1.

R: Monitoring the function progress with a progress bar. A wrapper around nested ifelse. Online Text Correction. How to combine multiple CSV files into one using CMD - Markdown Tables generator - Version Control, File Sharing, and Collaboration Using GitHub and RStudio. The “Ten Simple Rules for Reproducible Computational Research” are easy to reach for R users. Empirical Software Engineering using R: first draft available for download. Implementation of a basic reproducible data analysis workflow. Principal Component Analysis. Implementation of a basic reproducible data analysis workflow. Endole: Business Information Company Check. How to really do an analysis in R (part 1, data manipulation) - SHARP SIGHT LABS. Ggedit – interactive ggplot aesthetic and theme editor.

Ggedit 0.0.2: a GUI for advanced editing of ggplot2 objects. The Meeting Point Locator. Two meanings of priors, part I: The plausibility of models. Books I like. R - Change default alignment in pander (pandoc.table) The PValues Data Table. Learning Statistics on Youtube. Products. Solarized - Ethan Schoonover. Rguide. GitHub - caesar0301/awesome-public-datasets: An awesome list of high-quality open datasets in public domains (on-going). Qinwf/awesome-R: A curated list of awesome R frameworks, packages and software.

Rguide. Plot some variables against many others with tidyr and ggplot2. Express Intro to dplyr. One function to run them all… Or just eval. Virtual Library of Simulation Experiments: Test Functions and Datasets. First steps with Non-Linear Regression in R. CRAN Task View: Design of Experiments (DoE) & Analysis of Experimental Data. Using knitr and pandoc to create reproducible scientific reports. Python Annotated Heatmaps. 100 “must read” R-bloggers’ posts for 2015.

Google scholar scraping with rvest package. A Complete Tutorial on Time Series Modeling in R. Learning R Using a Chemical Reaction Engineering Book: Part 4. Bringing the powers of SQL into R. Using Python and R together: 3 main approaches. Introduction to bootstrap with applications to mixed-effect models. Mixture of Gaussian Distributions.

Demonstration of nls function. LsExamples. Learn R : 12 Books (Free PDFs!) and Online Resources - YOU CANalytics. Linear or Nonlinear Regression? That Is the Question. Wandering through the beautiful world of math, computations and visualizations. Online Derivative Calculator. The Yacas computer algebra system. R tips pages. Curve Fitting with Linear and Nonlinear Regression. Nonlinear Regression. Simple Nonlinear Regression. Using R for Time Series Analysis — Time Series 0.2 documentation. Trevor Stephens — Titanic: Getting Started With R. Data Analytics for Beginners: Part 1. Weekly road fuel prices - Statistical data sets.

Using Linear Regression to Predict Energy Output of a Power Plant. All Datasets. Learning Chemical Engineering. District Data Labs - How to Transition from Excel to R. Rounding numbers in Access. Design of Experiments. Two Way ANOVA in R. Introduction To Random Forest - Simplified. Powerful Guide to learn Random Forest in R and Python. Yet another post on google scholar data analysis. Probability and statistics. Linear algebra. Learning Path To Start Your Data Science Career.