background preloader

Workflow, Collaboration, Reproducibility

Facebook Twitter

Reproducible Research. The goal of reproducible research is to tie specific instructions to data analysis and experimental data so that scholarship can be recreated, better understood and verified.

Reproducible Research

R largely facilitates reproducible research using literate programming; a document that is a combination of content and data analysis code. The Sweave function (in the base R utils package) and the knitr package can be used to blend the subject matter and R code so that a single document defines the content and the algorithms. Basic packages can be structured into the following groups: LaTeX Markup : The Hmisc, xtable and tables packages contain functions to write R objects into LaTeX representations. Hmisc also includes methods for translating strings to proper LaTeX markup (e.g., ">=" to "$\geq$"). An incomplete list of packages which facilitate literate programming for specific types of analysis or objects: Reproducible Analytical Pipeline. Producing official statistics for publications is a key function of many teams across government.

Reproducible Analytical Pipeline

It’s a time consuming and meticulous process to ensure that statistics are accurate and timely. With open source software becoming more widely used, there’s now a range of tools and techniques that can be used to reduce production time, whilst maintaining and even improving the quality of the publications. This post is about these techniques: what they are, and how we can use them. The current statistics production process The process for official statistics production in Government varies widely across departments. Broadly speaking, data are extracted from a datastore (whether it is a data lake, database, spreadsheet, or flat file), and are manipulated in a proprietary statistical software package, and possibly in proprietary spreadsheet software. Nteract. Markdown or LaTeX? What happens if you ask for too much power from Markdown?

Markdown or LaTeX?

R Markdown is one of the document formats that knitr supports, and it is probably the most popular one. I have been asked many times about the choice between Markdown and LaTeX, so I think I'd better wrap up my opinions in a blog post. These two languages (do you really call Markdown a language?) Are kind of at the two extremes: Markdown is super easy to learn and type, but it is primarily targeted at HTML pages, and you do not have fine control over typesetting ( really? Really?) What is the problem? What is the root problem? R Markdown Custom Formats. The R Markdown package ships with a raft of output formats including HTML, PDF, MS Word, R package vignettes, as well as Beamer and HTML5 presentations.

R Markdown Custom Formats

This isn’t the entire universe of available formats though (far from it!). R Markdown formats are fully extensible and as a result there are several R packages that provide additional formats. In this post we wanted to highlight a few of these packages, including: tufte — Documents in the style of Edward Tufterticles — Formats for creating LaTeX based journal articlesrmdformats — Formats for creating HTML documents. Tufte Handouts. Overview Tufte Handouts are documents formatted in the style that Edward Tufte uses in his books and handouts.

Tufte Handouts

Tufte’s style is known for its extensive use of sidenotes, tight integration of graphics with text, and well-set typography: You can see a full example of a document produced with the Tufte Handout template here: Tufte Handout Example. Production Quality Report with R and knitr. \documentclass[nohyper,justified]{tufte-handout} \usepackage{xltxtra,fontspec,xunicode} \usepackage{sectsty} %% change fonts for sections \setmainfont{Source Sans Pro Light} \begin{document} \pagenumbering{gobble}

Production Quality Report with R and knitr

Writing papers using R Markdown. I have been watching the activity in RStudio and knitr for a while, and have even been using Rmd (R markdown) files in my own work as a way to easily provide commentary on an actual dataset analysis.

Writing papers using R Markdown

Yihui has proposed writing papers in markdown and posting them to a blog as a way to host a statistics journal, and lots of people are now using knitr as a way to create reproducible blog posts that include code (including yours truly). The idea of writing a paper that actually includes the necessary code to perform the analysis, and is actually readable in its raw form, and that someone else could actually run was pretty appealing. Unfortunately, I had not had the time or opportunity to actually try it, until recently our group submitted a conference paper that included a lot of analysis in R that seemed like the perfect opportunity to try this.

Fast-track publishing using knitr: the setup using .RProfile with custom CSS + some HTML goodies (part II) Fast-track publishing using knitr is a short is a short series on how I use knitr to get my articles faster published.

Fast-track publishing using knitr: the setup using .RProfile with custom CSS + some HTML goodies (part II)

This is part II where I will show how you can tweak RStudio into producing seamless MS Word-integration by using the .RProfile together with CSS, a few basics about HTML that might be good to know, and lastly some special characters that can be useful. In the previous post, part I, I explained some of the more general concepts behind fast-track publishing and why I try to get my manuscript into MS Word instead of using LaTeX or other alternatives. The series consists out of five posts: Fast-track publishing using knitr: exporting images for sharing and press (part III)

Fast-track publishing using knitr is a short series on how I use knitr to speedup publishing in my research.

Fast-track publishing using knitr: exporting images for sharing and press (part III)

This is the third article in the series devoted to plots. Hopefully you will through this post have the need-to-know stuff so that you can (1) add auto-numbering to your figures, (2) decide on image formats, (3) choose image resolution, and (4) get anti-aliasing working. The series consists out of five posts: Floating table of contents for your html reports using knitr.

If you love knitr and rstudio and use them to produce long reports, you probably know that you can produce a table of contents in your html (and pdf) documents.

Floating table of contents for your html reports using knitr

In the newer rstudio (Version 0.98.801 or later) you do it by requesting a toc in the doc header, something like this. title: "cssTest" output: html_document: toc: yes But wouldn’t it be nice if the table of contents, instead of being stuck at the top of the document, was available at the edge of the browser window and doesn’t move when you scroll?

Using R in LaTeX with knitr and RStudio. Hi, I presented today at INSEE R user group (FL R) how to use knitr (Sweave evolution) for writing documents which are self contained with respect to the source code: your data changed? No big deal, just compile your .Rnw file again and you are done with an updated version of your paper! How do I Sweave a multiple-file project. R - Repeat headers when using xtable with longtable option. Scientific notation for R/latex. Using R within a latex document can be a component of reproducible research, offering (a) some assurance against typographical errors in transcribing results to the latex file and (b) the ability for others to reproduce the results. For example, one might like to explain how close the computed integral of the Witch of Agnesi function ## 12.56637 with absolute error < 1.3e-09 is to the true value of $4\pi$.

Reproducible research with R, Knitr, Pandoc, Word. Below I briefly outline why Pandoc is an essential part of my research workflow, and demonstrate how to seamlessly integrate it with a bibliographic system and code written in R to produce high quality word or pdf documents. I also include all the functions needed to get this working fast. Knitr is great. I'm writing this in it right now. It 'knits' markdown together with R code and outputs some pretty excellent html pages. The difficulty is getting these into Word for final editing, emailing to colleagues, or similar. There is an R library, Pander, which works well. Write a markup document in RStudio, set your working directory to the location of your file, then compile it as follows: name = "demo"library(knitr)knit(paste0(name, ".Rmd"), encoding = "utf-8")system(paste0("pandoc -o ", name, ".docx ", name, ".md")) The code above works by running the command line from within R.

Writing a MS-Word document using R (with as little overhead as possible) ReporteRs. Produce nice outputs for graphical, tabular and textual reporting. ReporteRs is an R package for creating Microsoft (Word docx and Powerpoint pptx) and html documents. It does not require any Microsoft component to be used. It runs on Windows, Linux, Unix and Mac OS systems. R Markdown: How to insert page breaks in a MS Word document. Simple workflow for using R with MS Office products. \documentclass[xcolor=svgnames]{beamer} Reproducible research, training wheels, and knitr. Last week I gave a short talk at CMU’s statistical computing seminar, Stat Bytes. Tips and Tricks for HTML and R. Stargazer package for beautiful LaTeX tables from R stat models output. Stargazer is a new R package that creates LaTeX code for well-formatted regression tables, with multiple models side-by-side, as well as for summary statistics tables.

Its latest version, released in early January 2013, can also output the content of data frames directly into LaTeX. Compared to available alternatives, stargazer excels in three regards: its ease of use, the large number of models it supports, and its beautiful aesthetics. Ease of use stargazer was designed with the user’s comfort in mind. The learning curve is very mild and all arguments are very intuitive, so that even a beginning user of R or LaTeX can quickly become familiar with the package’s many capabilities.

Table as an Image in R. Usually, it's best to keep tables as text, but if you're making a lot of graphics, it can be helpful to be able to create images of tables. Creating the Table After loading the data, let's first use this trick to put line breaks between the levels of the effect variable. Depending on your data, you may or may not need or want to do this. Customising ProjectTemplate in R. Rstudio and makefiles: Mind your options! Sharing a Project with multiple users – RStudio Support. If you are using RStudio Projects, you may want to share a project with multiple users on the same network in order to work collaboratively on the same project.

Copying files with R. Scheduling R Markdown Reports via Email. Renkun-ken/formattable. Automating R exercises and exams using the exams package. It's a pain to design statistics exercises each semester, and because students from previous share old exercises with the new incoming students, it's hard to design simple exercises that students haven't already seen the answers to. On top of that, some students try to cheat during the exam by looking over the shoulder of their neighbors.

Homework exercises almost always involve collaboration even if you prohibit it. It turns out that you can automate the generation of fixed-format exercises (with different numerical answers being required each time). You can also randomly select questions from a question bank you create yourself. And you can even create a unique question paper for each student in an exam, to make cheating between neighbors essentially impossible (even if they copy the correct answer to question 2 from a neighbor, they end up answering the wrong question on their own paper).

RStudio and GitHub. Using Gitbook with R Markdown.