background preloader

Διάφορα

Facebook Twitter

Create smooth animations in R with the tweenr package. There are several tools available in R for creating animations (movies) from statistical graphics.

Create smooth animations in R with the tweenr package

The animation package by Yihui Xie will create an animated GIF or video file, using a series of R charts you generate as the frames. And the gganimate package by David Robinson is an extension to ggplot2 that will create a movie from charts created using the ggplot2 syntax: in much the same way that you can create multiple panels using faceting, you can instead create an animation with multiple frames. But from a storytelling perspective, such animations can sometimes seem rather disjointed.

For example, here's the example (from the gganimate documentation) of crating an animated bubble chart from the gapminder data. (NOTE: to use the gganimate package, you will need to install ImageMagick. Creating an animation using R. In this post, I will show you how to create an animation using R and ffmpeg.

Creating an animation using R

The idea to do so is pretty simple: Generate a number of snapshotsCombine them in a video file using ffmpeg The best way to learn about the art of animation is by doing it ourselves, so our work example is the infamous Broken Stick Problem. Dual axes time series plots with various more awkward data. In my most recent blog post I introduced the dualplot() R function, which allows you to create time series plots with two different scales on the vertical axes in a way that minimises the potential problems of misinterpretation.

Dual axes time series plots with various more awkward data

See that earlier post for a discussion of the pros and cons of the whole approach, which I won’t repeat here. I’ve now made some minor enhancements: improved handling of series that start at different times fixed some minor problems with series that have different frequencies (eg monthly versus quarterly) added a fairly sensible default choice of axes for when one or both of the series dip into negative territory improved default legend and axis titles for situations when the data are columns in a data frame improved x axis labels for when x is a an object with date or time characteristics When time series start at different times Default to use first available “cross over point” Starting at the same vertical position Converging to a common vertical position.

Dual axes time series plots may be ok sometimes after all. Are they really as bad as all that?

Dual axes time series plots may be ok sometimes after all

I’ve been mulling over time series charts with two different y axes, which are widely deprecated in the world of people who see ourselves as professional data workers. Looking down on dual axis time series charts is one of the things that mark one as a member of a serious data visualiser - after shaking our heads at pie charts, and cringing in horror at three-dimensional chart junk. But I’ve come to the surprising conclusion (surprising for me) that the arguments against them don’t stack up - at least not to the stage of justifying a blanket ban. Before I go further, here’s my best example of one of these charts with real data, as a talking point. The data come from the Reserve Bank of New Zealand, and my chart below is essentially a version of their graphic that I’ve enhanced to meet my own style requirements. Stephen Few looks at the issue of dual axes plots in a well reasoned piece and concludes cautiously that they are never justified. Source(" Code.

Explore with Shiny the impact of sample size on "p-charts" I wanted to help explore the implications of changing sample size for a quality control process aimed at determining the defect rate in multiple sites.

Explore with Shiny the impact of sample size on "p-charts"

Defect in this particular case is binary ie the products are either good or not. Much of the advice on this in the quality control literature strikes me as rather abstract and technical, while still not sufficiently detailed on what really needs to be done for power calculations. My idea was to instead let an end user see in advance what sort of thing they would see if they went into the exercise with different sample sizes. The result was this interactive R/Shiny/ggvis web app: I’m not an expert in statistical quality control by any means, but the statistics are straightforward. Stefano-meschiari/latex2exp. Getting Genetics Done: R: single plot with two different y-axes. I forgot where I originally found the code to do this, but I recently had to dig it out again to remind myself how to draw two different y axes on the same plot to show the values of two different features of the data.

Getting Genetics Done: R: single plot with two different y-axes

This is somewhat distinct from the typical use case of aesthetic mappings in ggplot2 where I want to have different lines/points/colors/etc. for the same feature across multiple subsets of data. For example, I was recently poking around with some data examining enrichment of a particular set of genes using a hypergeometric test as I was fiddling around with other parameters that included more genes in the selection (i.e., in the classic example, the number of balls drawn from some hypothetical urn). I wanted to show the -log10(p-value) on one axis and some other value (e.g., “n”) on the same plot, using a different axis on the right side of the plot.

Here’s how to do it. First, generate some data: Call matplotlib from R. Call matplotlib from R Matti Pastell Motivation I often use Python and matplotlib for exploring measurement data (from e.g. accelerometers), even if I use R for the actual analysis.

Call matplotlib from R

The reason is that I like to be able to flexibly zoom into different parts of the plot using the mouse and this works well for me with matplotlib. So I decided to try to call matplotlib from R using Rcpp and Python/C API. It was surprisingly simple to get it working and I put together a small R-package Rpyplot. Circular Migration Flow Plots. A article of mine was published in Science today.

Circular Migration Flow Plots

It introduces estimates for bilateral global migration flows between all countries. The underlying methodology is based on the conditional maximisation routine in my Demographic Research paper. However, I tweaked the demographic accounting which ensures the net migration in the estimated migration flow tables matches very closely to the net migration figures from the United Nations. My co-author, Nikola Sander, developed some circular plots for the paper based on circos in perl. Alluvial diagrams. Parallel coordinates plot is one of the tools for visualizing multivariate data.

Alluvial diagrams

Every observation in a dataset is represented with a polyline that crosses a set of parallel axes corresponding to variables in the dataset. You can create such plots in R using a function parcoord in package MASS. For example, we can create such plot for the built-in dataset mtcars: This produces the plot below. The lines are colored using a blue-to-red color ramp according to the miles-per-gallon variable. What to do if some of the variables are categorical? This produces the following (red lines are for passengers who did not survive): It is not so easy to read, is it. Bullet Graph in R. Stephen Few designed the Bullet Graph as a way to display measurements vs. goals/other benchmarks or as a comparison of benchmarks.

Bullet Graph in R

The screenshots below shows the bullet chart build by Few’s specifications in R based on the grid-package: The code and a detailed example can be found on github. Next steps are: Comparison of current measure to a future targetSome alternative designs as displaying negative and positive values reversed background filling, and reversing of colour and scaleMore than three rangesComparative measures… Gauge Chart in R. Mathematical Annotation in R. R Core Team (R-core@R-project.org ) The R Project for Statistical Computing. Lijia Yu (yu@lijiayu.net / GitHub / Twitter) A master candidate majoring in Bioinformatics at Beijing Institute of Genomics.

Karl Broman (kbroman@biostat.wisc.edu / GitHub / Twitter) Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison; research in statistical genetics; developer of R/qtl (for R) Kevin Ushey (kevinushey@gmail.com / GitHub / Twitter) MSc., University of British Columbia; interested in statistical genetics and data visualization Want to write mathematical symbols and expressions in R graphics?

You can use an R expression() instead of normal text, e.g. plot(1:10, main = expression(alpha + beta)). Below is a demo that shows you everything about plotting math in R (it was written by the R Core Team; see ? If you want to combine multiple mathematical expressions with text, use paste() insideexpression(), as in the following. Exporting nice plots in R. A vital part of statistics is producing nice plots, an area where R is outstanding. The graphical ablility of R is often listed as a major reason for choosing the language.

It is therefore funny that exporting these plots is such an issue in Windows. How to change the alpha value of colours in R. Often I like to reduce the alpha value (level of transparency) of colours to identify patterns of over-plotting when displaying lots of data points with R. So, here is a tiny function that allows me to add an alpha value to a given vector of colours, e.g. a RColorBrewer palette, using col2rgb and rgb, which has an argument for alpha, in combination with the wonderful apply and sapply functions. The example below illustrates how this function can be used with colours provided in different formats, thanks to the col2rgb function. Session Info sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RCurl_1.95-4.1 bitops_1.0-5.

Exporting plain, lattice, or ggplot graphics. In a recent post I compared the Cairo packages with the base package for exporting graphs. Matt Neilson was kind enough to share in a comment that the Cairo library is now by default included in R, although you need to specify the type=”cairo” option to invoke it. In this post I examine how the ggplot and the lattice packages behave when exporting. Basic plot. Setting graph margins in R using the par() function and lots of cow milk. It is fairly straightforward to set the margins of a graph in R by calling the par() function with the mar (for margin!) Argument. For example, par(mar=c(5.1,4.1,4.1,2.1) sets the bottom, left, top and right margins respectively of the plot region in number of lines of text.

High Resolution Figures in R. As I was recently preparing a manuscript for PLOS ONE, I realized the default resolution of R and RStudio images are insufficient for publication. PLOS ONE requires 300 ppi images in TIFF or EPS (encapsulated postscript) format. In R plots are exported at 72 ppi by default. I love RStudio but was disappointed to find that there was no options for exporting figures at high resolution. PLOS ONE has extensive instructions for scaling, compressing, and converting image files to meet their standards.

Unfortunately, there is no good way to go from low resolution to high resolution (i.e. rescaling in Photoshop) as my friend Liam Revell, and phytools author, pointed out with entertaining illustration from PhD comics (upper right panel).