Testing Linux Distribution Tools for Custom Container Image Creation

This is the second in a three-part series based on the presentation on building container images that we gave at KubeCon/Cloud Native Con North America 2019.

In our first post, we described problems that we’ve observed with standard industry practices around creating container images, the most significant being that these practices make it difficult to ensure that containers remain repeatable, identifiable, and up-to-date.

As we mentioned at the end, a number of Linux distribution tools already exist that can preserve these properties when building packages and images. This time, we want to look at some of those tools in the context of custom container image creation.

Challenges with Distribution Tools

Distribution tools have historically been difficult to both understand and use. They also typically suffer from a tension between the need to be developer-friendly and the need for them to be sophisticated enough to let you deliver production-grade software.

Docker, for example, made it easy for app developers to become operating system vendors, which made it really popular. But Docker won’t offer you the information you need in order to make engineering decisions about whether your artifact is okay to ship. So, its convenience comes at a serious cost.

A similar tension operates at the ecosystem level. Docker is popular in part thanks to the many components available to app developers for creating deployment-ready apps. On the other hand, it is very difficult to reason about the trustworthiness of these components. So, here the tension is between available and trusted content.

Lastly, developers want distribution tools to work quickly so they can iterate as fast as possible. But that is in tension with their need to deploy a reliable app. Tools for building reliable software are almost by definition going to do that work more slowly.

Testing Out Distribution Tools

Given those tensions, is it possible to use distribution tools to create container images that are repeatable, identifiable, and up-to-date? We decided to try and find out.

We first had to pick some distribution tools to experiment with. From the many available, we chose four that we thought had a chance of success:

Debian (DebOS) is a reliable, long-lived project with a lot of software available already packaged in its system.
Buildroot is an embedded image creator designed for building Linux-based firmware. It doesn’t require updating because it assumes that you will replace the image when you need to create an update, making it somewhat similar to a container system.
Yocto Project has many more affordances than other tools for building multiple images with a shared core, but it makes trade-offs in doing that.
Guix SD is based on functional distribution content concepts that are interested in repeatability and immutable operating systems, both of which are highly relevant to the container space. Its disadvantage is that it uses Scheme for its common expression language, which makes it hard to use.

We then attempted to build different types of container images using these tools and evaluated them on four criteria:

Ease of use
Output image size (is it smaller than what is already available on Docker Hub?)
The engineering effort required to make the container image
The presence and quality of the resulting software bill of materials that would allow us reason about things like compliance and security vulnerabilities

Distributions in the Lab

Our first experiment explored using the tools to build a base OS for developing Go applications.

DebOS proved very easy to use, although you do need to be working with a Debian-based operating system like Ubuntu. The image we created was half the size of the image available on DockerHub, and the engineering effort was fairly straightforward (it uses a YAML file that isn’t that dissimilar from a Docker file). Where it failed to impress was with the software bill of materials, which it didn’t offer a direct path to (although you can use other tools to get there).
Buildroot was not that straightforward to use. It took a long time to figure out exactly how to do what we wanted and we never could figure out how to build an image with a Golang binary in it. Indeed, it took a lot of effort to build any kind of container image. But once we got there, it was quite easy to create a software bill of materials in a variety of formats.
Yocto Project is another tool that’s not easy to learn and it required a lot of engineering effort to generate an image. But it did produce fairly small image sizes and generated a comprehensive and customizable software bill of materials.
Lastly, Guix was driven by an easy-to-use command line tool, but required a lot of work to move beyond that in order to package the target application. Guix also created very large images that took a fair bit of engineering effort to generate, though recent discussions with a project maintainer indicate that this may be a bug in the way Go is currently packaged for Guix. Sadly, it didn’t offer an automated software bill of materials. You can dig down to find that information, but you have to write your own code to do it.

Despite none of them being perfect, we were able to use each of these tools to make a base OS that we could develop an app on top of. That fits the current development model of choosing a base OS, installing app dependencies on top of it, and then installing the app.

But we also wanted to push the envelope and see if we could build an “ideal container,” which is just the app and its dependencies, nothing more.

Building Custom, Minimal Container Images

First, a word about packaging. In the distribution space, to create a software package you effectively describe what’s in the package and how to build and install it as part of the manifest file. A package manager then reads that metadata, including the tree of dependencies for the application, and installs what is required into the filesystem. That led us to ask whether we could leverage the package manager to install only what’s required into an image. Instead of shipping an entire OS image in our containers, could we start with the application and just install its dependencies into the container image?

So, in our second experiment, we tried to use the same four tools to make a container with just enough information to run the app and all of its dependencies.

In order to do that, we needed to create packages for the applications we wanted to build containers for. With each of these tools coming from different ecosystems, our already-steep learning curve grew steeper. Each ecosystem has its own way of expressing metadata, building, and packaging applications. On top of that, the embedded and HPC targeting systems offer little support for applications built in cloud native frameworks like Go, Bazel, or Ruby. Tools do exist that can help you automate package creation, but the cloud native support varies and is often quite minimal.

Given our experience with packaging for these build tools, we wondered whether we could achieve our end goal – building an application-first container image – with an app that was already packaged. Therefore, we set about trying to build a container with Postrgres and only its dependencies. It turned out that we could make some progress with DebOs, Buildroot, and the Yocto Project – and that Guix would take us almost all the way to where we wanted to be.

With the first three, we were able start off with a minimal base operating system, install our packaged application and all of its dependencies, and then build a smaller container image with less bloat and a smaller attack surface.

But Guix has a command line option called “guix pack.” This takes an application, gathers all of its dependencies and packs it into a distributable format. It even has an output format for generating Docker-ready container images. So, in this case, we were able to generate a Postgres container image that held only Postgres, its runtime dependencies, and nothing else.

Solving the rest of the problem

In these two experiments, we made real progress but didn’t completely solve the problem we had set ourselves. We were able to create containers that were repeatable, identifiable, and up-to-date. Unfortunately, we weren’t able to achieve this in a way that met our usability desires – these tools required significant engineering effort. We had found pointers, though, to enhancements that might make them more generally useful.

Next time, we’ll share our thoughts on next steps the community can take to build truly app-centric container images.

For more open source insights, follow the Open Source Blog and stay tuned to our Twitter channel (@vmwopensource).

Challenges with Distribution Tools

Testing Out Distribution Tools

Distributions in the Lab

Building Custom, Minimal Container Images

Solving the rest of the problem

Related Articles

Reflections on the Open Source Maintainers Report 2023

Amusement Parks and Shipping Ports: CNCF Ambassadors Tell All

Deprecating an Open Source Project, Part 2