background preloader

Workflow, Collaboration, Reproducibility

Facebook Twitter

Project workflow. Designing projects - Nice R Code. The scientific process is naturally incremental, and many projects start life as random notes, some code, then a manuscript, and eventually everything is a bit mixed together.

Designing projects - Nice R Code

R Best Practices. When learning to write code for research projects, it can be overwhelming to figure out how to set-up a project and the novice programmer may not yet have the experience necessary to forsee potential pitfalls of a given, seemingly inconsequential decision.

R Best Practices

This post provides a discussion of best practices for developing code-based projects and for writing R code in a research setting with an eye toward proactively avoiding common pitfalls. While reading, it is worth keeping in mind that what works for a certain project or with different collaborators will likely vary, but a consistent and well thought out approach to designing project structures and writing code provides a strong base from which to develop subsequent projects. 1.1 Keep files in a project folder It is good practice to keep all files for a given project in the same project-specific folder.

You can then create sub-folders for specific types of files, such as data, figures, function files, and manuscripts. Prime Hints For Running A Data Project In R. Sharing a Project with multiple users – RStudio Support. If you are using RStudio Projects, you may want to share a project with multiple users on the same network in order to work collaboratively on the same project.

Sharing a Project with multiple users – RStudio Support

However, the default R setup uses the same .Rhistory and .RData files for all users accessing the project which will lead to conflicts. You can solve this by creating a project in a location accessible to all the users you wish to be able to edit it and adding in the following code in a .Rprofile file in that project’s working directory. This will cause each user's .Rhistory and .RData to be saved to separate files within a subdirectory of the project folder. # define user data directory and history file location local({ dataDir <- "userdata" if (identical(.Platform$OS.type, "windows")) username <- Sys.getenv("USERNAME") else username <- Sys.getenv("USER") userDir <- file.path(dataDir, username) if (!

My RStudio Configuration. Use_blank_slate()` sets your @rstudio preference to NEVER save/restore .RData on exit/startup, which is a lifestyle endorsed by many #rstats folks (including me). Just did a clean install and got my first chance to use this on my own behalf □ Slides: Zen And The aRt Of Workflow Maintenance. Rstudio and makefiles: Mind your options! I have written this post mostly for myself.

Rstudio and makefiles: Mind your options!

I don’t want to waste 2 hours on this problem again at some point in the future. Hopefully others might stumble on it too and save some aggravation. So, the issue I had this morning was writing up a a makefile in RStudio. I am new to make and makefiles, but have been able to get them running successfully in the past; however, the makefiles I’ve used were mostly borrowed from others and only minor edits were made. Reproducible Environments. By Sean Lopp Great data science work should be reproducible.

Reproducible Environments

The ability to repeat experiments is part of the foundation for all science, and reproducible work is also critical for business applications. Team collaboration, project validation, and sustainable products presuppose the ability to reproduce work over time. In my opinion, mastering just a handful of important tools will make reproducible work in R much easier for data scientists.

RStudio’s multiple cursors rule. (TL;DR: Come on.

RStudio’s multiple cursors rule

This is pretty short. Productivity level up by harnessing the power of RStudio!) Edit several lines at once in RStudio. Skim_tee() can be used in pipes. Customising your Rprofile. Every time R starts, it runs through a couple of R scripts.

Customising your Rprofile

One of these scripts is the .Rprofile. This allows users to customise their particular set-up. However, some care has to be taken, as if this script is broken, this can cause R to break. If this happens, just delete the script! Full details of how the .Rprofile works can be found in my book with Robin on Efficient R programming. A few months ago, I noticed my Rprofile was becoming increasing untidy, so I bundled it up into a single, opinionated, package. Installation You can install the package from GitHub with: # install.packages("remotes") remotes::install_github("csgillespie/rprofile") A guide to tools for collaboration with R.

This a brief guide to using R in collaborative, social ways.

A guide to tools for collaboration with R

R is a powerful open-source programming language for data analysis, statistics, and visualization, but much of its power derives from a large, engaged community of users. This is an introduction to tools for engaging the community to improve your R code and collaborate with others. (Am I missing anything? Let me know in the comments and I’ll update this guide.) Topics Asking questions via e-mail, listservs and bulletin boards.

Collaborating on data science projects - RStudio Community. I help lead the Data Science org at UCSB and generally we take the approach of team driven projects.

Collaborating on data science projects - RStudio Community

For the structure of the project/repo we use the Cookie Cutter Template68, although we’ve created a simplified version of it. We’ve picked up on Agile Team Management so we do things like sprints and stand ups while meeting weekly. We have them create a general outline15 which can be changed iteratively because often people won’t know what the project will look like but keeps them accountable. We also introduced the concept of milestones, which are simple outlines that help show us and them the progress they’ve done for each week or meeting time. Here’s a simple example22, this helps tie the outline together, while having someone be the person in charge of making sure stand ups and blockers are addressed. Example Collaborative Project: Movie Rating Prediction. RStudio and GitHub.

Version control has become essential for me keeping track of projects, as well as collaborating.

RStudio and GitHub

It allows backup of scripts and easy collaboration on complex projects. RStudio works really well with Git, an open source open source distributed version control system, and GitHub, a web-based Git repository hosting service. Ukgovdatascience/dotfiles: ⚠️ Templates of tools to help prevent committing sensitive data to github. Reproducible Analytical Pipeline. Producing official statistics for publications is a key function of many teams across government. It’s a time consuming and meticulous process to ensure that statistics are accurate and timely. With open source software becoming more widely used, there’s now a range of tools and techniques that can be used to reduce production time, whilst maintaining and even improving the quality of the publications. This post is about these techniques: what they are, and how we can use them. The current statistics production process.

Targets: Democratizing Reproducible Analysis Pipelines. Make-like pipelines enhance the integrity, transparency, shelf life, efficiency, and scale of large analysis projects. With pipelines, data science feels smoother and more rewarding, and the results are worthy of more trust. ...looking to get your project/s organised in the new year? Hoping just to distract from feelings of impending doom/crushing loss of hope? I promise workflowing will make you feel better... and @wmlandau has made it SO EASY.— Dr Saras Windecker (@smwindecker) January 8, 2021 {targets} and its predecessors are visionary work. Targets install.packages("targets") Making your projects more reproducible and efficient. Workflows of Refactoring. Pagedown: Paginate the HTML Output of R Markdown with CSS for Print. Beginner's Guide to Travis-CI for R. Have you seen all those attractive green badges on other people’s R packages and thought, “I want a lovely green badge!”

Always a nice feeling when Travis manages to actually build. #runconf16 pic.twitter.com/7qZfH2OEij— Julia Silge (@juliasilge) April 1, 2016 OF COURSE YOU DO. Travis CI: Embedding Status Images. Travis CI for R — Advanced guide. Testing Your Project on Multiple Operating Systems. Migrating from Travis CI to GitHub Actions for R packages. If you develop an R package on GitHub and aren’t using GitHub Actions, this post is for you. Whether you want to move your repository from TravisCI to GitHub Actions, or you’re not sure why that’s necessary, or even if you don’t know what GitHub Actions is, you should definitely read on. If you prefer a video walkthrough instead of reading, you can watch a video version of this post: For those who aren’t familiar with Travis CI–it’s a popular CI/CD provider, which stands for Continuous Integration/Continuous Delivery.

It might sound scary, but it’s just a fancy term for something very useful: automating software workflows based on certain triggers in your GitHub repository. Docker Intro for R Users. Docker Images for R: r-base versus r-apt. 2019-01-21 R Docker Andrew B. Deploying an R Shiny App With Docker. RDCOMClient + Outlook email. R send html email with inline images such as plots. Automated Email Reports with R – Journey of Analytics. R is an amazing tool to perform advanced statistical analysis and create stunning visualizations.

However, data scientists and analytics practitioners do not work in silos, so these analysis have to be copied and emailed to senior managers and partners teams. Cut-copy-paste sounds great, but if it is a daily or periodic task, it is more useful to automate the reports. So in this blogpost, we are going to learn how to do exactly that.