docker build containers

Paul Bauer
Udacity Eng & Data
Published in
3 min readApr 20, 2016

--

UPDATE

As of Docker 17.05 or later, a new feature deprecates the advice in this article. If you have to use an older version of Docker, this may be a suitable strategy. Otherwise, I recommend Multi-Stage Builds.

intro

This is the first post in a series on how we use Docker at Udacity. It covers Docker Build Containers and was adapted from our own Docker best practices guide.

prerequisites

If you are new to Docker or would like a refresher on Docker concepts like images, Dockerfiles, and containers, see A Beginner Friendly Introduction to Containers, VMs and Docker by Preethi Kasireddy.

alpine linux

All examples use Alpine Linux, a security-oriented, lightweight Linux distribution based on musl libc and busybox. For those familiar with debian apt and centos yum, the Alpine analog is apk.

image layers

We use Docker images for packaging, but we also use Docker containers to build the artifacts that go into the image. To understand why and how, we first need to understand Docker layers.

Docker images are typically built using an executable description called a Dockerfile. Each verb in the Dockerfile (RUN, COPY, LABEL, etc) creates a separate layer, which has additive overhead. For more details, the official Dockerfile Best Practices has advice on minimizing the number of layers.

A common Dockerfile mistake is using a separate RUN verb to cleanup temporary files.

Unfortunately, we still pay the price of those cache files because layers are additive. The RUN rm … command creates a layer with tombstone files and does not remove the cached files from the previous layer. You can see this by examining docker history for the built image.

We need to move the cleanup rm to the same layer that creates the intermediate files.

Compare the docker history results with what we had before this optimization.

build containers

A common anti-pattern is using the production Dockerfile and image to isolate builds.

  1. use RUN to install development tools
  2. use COPY to include project source
  3. use another RUN to build artifacts (jar, elf binary, static assets, etc)

This pattern is convenient because it keeps all of the build tooling self-contained; the build will work on a developer laptop and on a build server without having to pre-install compilers or build tools like maven, gradle, or npm.

But all those extra bits are unused when the image is deployed to production. This is bad not just for reasons of bloat, but potential security risks. See Shellshock. And as we just saw in the layers section, we can’t use RUN rm to cleanup the build tools and source code later in the Dockerfile.

example

Here’s a golang example of what is convenient but not optimal as it keeps around intermediate build tools and source. The following is built and labeled udacity/my-awesome:big.

enter the build container

Instead, use the Build Container pattern. In place of one container, two are used — one for building, one for production. The build container holds all of the development or build-time bits like git and the go compiler. Built artifacts are retrieved from the build container by volume-mounting a work area. Afterwards, build artifacts are COPY’d into a minimal production image. Here’s a more optimal Dockerfile for our example.

And here is how to use a build container to build the udacity/my-awesome:1.0 image.

The size difference between the two approaches is substantial.

This principle applies with other platforms. For example, never call npm install inside a production Dockerfile for a node app.

If the build container is complex or takes a long time installing dependencies, it will warrant its own image and Dockerfile.

conclusion

Use separate containers to isolate your project builds. The project will build as easily on a developer laptop as it does on a build server all while keeping your deployed Docker images as small as possible.

--

--