An Introduction to Docker for R Users


(This article was first published on Colin Fay,and kindly contributed to 188bet appR-bloggers)

A quick introduction on using Docker for reproducibility in R.

Disclaimer: this blog post is an introduction to Docker for beginners,,
and will takes some shortcuts 😉

What is Docker??

Docker is"执行操作系统级的计算机程序。
virtualization,also known as ‘containerization'""
Wikipedia.As any
first line of a Wikipedia article 金宝搏网址about tech,this sentence is obscure
to anyone not already familiar with the content of the article.

所以,to put it more simply,,Docker is a program that allows to
manipulate (launch and stop) multiple operating systems (called
containers) on your machine (your machine will be called the host)
Just imagine having 10 RaspberryPi with different flavors of Linux,each
focused on doing one simple thing,that you can turn on and off whenever
you need to;但所有这些都发生在你的电脑上。

Why Docker & R??

Docker is designed to enclose environments inside an image / a
container.What this allows,例如,是要有一台Linux机器
a Macbook,or a machine with R 3.3 when your main computer has R 3.5.
Also,this means that您可以使用包的旧版本
特定任务,while still keeping the package on your machine

This way,you can"solve"dependencies issues: if ever you are afraid
dependencies will break your analysis when packages are updated,build a
container that willalwayshave the software versions you desire: be
it Linux,R,or any package.

Docker images vs Docker containers

On your machine,you're going to need two things: images,and
containers.Images are the definition of the OS,while the containers
are the actual running instances of the images
图像只有一次,while the containers are to be launched whenever
you need this instance.And of course,multiple containers of the same
images can be run at the same time.

To compare with R,这与安装和加载
package: a package is to be downloaded once,while it has to be launched
every time you need it.And a package can be launched in several R


A Docker image is built from aDockerfile.This file is the
configuration file,and describes several things: from what previous
docker image you are building this one,how to configure the OS,and
what happens when yourunthe container.In a sense,it's a little bit
like the描述+NAMESPACEfiles of an R package,哪一个
describes which are the dependencies to your package,gives meta
the userslibrary()ing the package.

所以,let's build avery basic Dockerfile对于R,focused on
reproducibility.The idea is this one: I have today an analysis that
works (for example contained in aRfile),and I want to be sure this
分析在未来总是有效的,regardless of any update to the
packages used.

So first,create a folder for your analysis,and a Dockerfile:


are building our image from.There are a lot of official images,and you
can also build from a local one.

Thisis,in a way,describing the dependency of your image;just
as in R,when building a package,你总是依赖另一个包裹
it only the{base}package).

If you're going for an R based image,Dirk Eddelbuettel和Carl Boettiger
are maintaining摇杆,a collection
of Docker images for R you can use.我们将使用摇臂/ R基在里面
this blogpost.

FROM rocker/r-base


Once we've got that,we'll add someRUNstatements: these are commands
which mimic command line commands.Remember what we want: an image that
will,ad vitam aeternam,run an analysis as if we were still today.So

The command to make R execute something,from the terminal,是R -e"my
.Let's add a{checkpoint}installation.

来自Rocker/R-Baserun R-E“install.packages('checkpoint')""

我们需要一个/root/.checkpointfolder to use{checkpoint},let's create
that one withmkdir(make directory).

来自Rocker/R-Baserun R-E“install.packages('checkpoint')"RUN mkdir /root/.checkpoint


Now,I need to get the script for my analysis from my machine (host) to
the container.为此,we'll need to useCOPY localfile
.I'll first create a folder to receive everything,,
withmkdir.Note that here,themyscript.R必须是一样的
folder as theDockerfileon your computer.

Let's say this is the content ofmyscript.R:

df < tidy_comb_alliris,, Species)
p < 蒂蒂斯格林格迪特df)
write.csvp,, "p.csv"")

Here,the{tidystringdist}that will be installed in the machine will
be the one from the date of today,即使我在一个
年,或者两个,or four.

来自Rocker/R-Baserun R-E“install.packages('checkpoint')"RUN mkdir /home/analysisCOPY myscript.R /home/analysis/myscript.R


CMDis the command to be run every time you'll launch the docker.What
we want ismyscript.Rto be sourced.

来自Rocker/R-Baserun R-E“install.packages('checkpoint')"RUN mkdir /home/analysisCOPY myscript.R /home/analysis/myscript.RCMD R -e"source('/home/analysis/myscript.R')""

Build,and run


Now,去建立你的形象。From your terminal,in the directory where
the Dockerfile is located,run:

docker build -t analysis .

-t name是图片的名字(分析),and.means it
will build theDockerfilein the current working directory.


Then,just launch with:

docker run analysis

And your analysis will be run 🎉!!


One thing to do now: you want to access what is created by your analysis
(herep.csv) outside your container;i.e,on the host.因为是的,,
as for now,everything that happens in the container stays in the
container.So what we need is to make the docker container share a
folder with the host.为此,we'll use what is called Volume,哪一个
are (roughly speaking),a way to tell the Docker container to use a
folder from the host as a folder inside the container

That way,everything that will be created in the folder by the container
将在容器关闭后保持。To do this,我们将使用
the -v flag when running the container,with
path/from/host:/path/in/container.Also,create a folder to receive
the results in both :

来自Rocker/R-Baserun R-E“install.packages('checkpoint')"RUN mkdir /home/analysis && mkdir /home/resultsCOPY myscript.R /home/analysis/myscript.RCMD cd /home/analysis && R -e"source('myscript.R')"&& mv /home/analysis/p.csv /home/results/p.csvmkdir ~/mydocker/results docker run -v ~/mydocker/results:/home/results  analysis

Wait for the computation to be done,and…




所以现在,every time you'll launch this Docker image,the analysis will be
performed and you'll get the result back.With no problem of
dependencies: the packages will always be installed from the day you
欲望。Although,this can be a little bit long to run as the packages
are installed each time you run the container.But as I said in the
Disclaimer,这是对Docker的基本介绍,R and
reproducibility,so the goal was more to get beginners on board with
Docker 🙂


  • Using{packrat},and get the
    library bundle in the container.

  • 使用远程:安装_版本()if you want your analysis to be
    based on package version instead of a time based installation.

来自Rocker/R-Baserun R-E“安装.packages(“remotes”);remotes::install_version('tidystringdist','0.1.2')"...
  • Use the Volume trick to bring dataintoyour container,so that
    any data will be analysed in the very same environment.

And other cool stuffs,but that's for another blog post 😉

Know more 金宝搏网址about Docker


To leave a commentfor the author,please follow the link and comment on their blog: Colin Fay.

188bet appR博客offers daily e-mail updates金宝搏网址 Rnews and tutorialson topics such as: Data science,, Big Data,, R jobs,visualization ( ggplot2,, Boxplots,, maps,, animation),programming ( RStudio,, Sweave,, LaTeX,, SQL,, 日食,, 吉特,, hadoop,, Web Scraping) statistics ( regression,, PCA,, time series,, trading还有更多…

如果你走这么远,why not 订阅更新 from the site?Choose your flavor: e-mail,, twitter,, 1188bet app,或 脸谱网...

Comments are closed.

Search 188bet appR-bloggers


Never miss an update!!
Subscribe to 188bet appR-bloggersto receive
e-mails with the latest R posts.

Click here to close (This popup will not appear again)