Using Linux Distribution Tools to Build Container Images – Part 1: Problems with Container Image Creation
In this post, we’ll talk about current issues with container image creation. Next time, we’ll review the performance of a representative set of Linux distribution tools in the context of custom container creation. In the final post, we’ll define a vision for app-centric container image creation, a workflow that isn’t available out of the box today but that we believe could address many of the challenges we currently face with container image creation. All three posts rely on a presentation we gave at KubeCon/Cloud Native Con North America 2019.
Let’s start with a definition. What is a container image?
A container image is effectively a packaging format – a mechanism for delivering an application and any dependencies required by that application (program and data) to the target system where the application will run.
When we think of container images in this way, it’s natural to compare them to other packaging formats (like RPM, for example). The question, then, is what we can learn from well-established package management formats and their tooling to create both packages and the ecosystems and content repositories that support them.
Industry-standard practices for container image creation
First, though, it’s worth noting how we usually create container images. Typically:
- An image is bootstrapped from a base operating system (OS).
- The language/framework native tooling (SDK) is installed on top of this base OS.
- And then that language/framework tooling is used to install the target application and its native language/framework dependencies.
Frequently, the first two steps are implicitly merged, i.e. the deployment of Go-based application might start from a base image that includes the Go toolchain.
The pattern above is most commonly implemented through a Dockerfile, taking a base OS image, configuring the SDK and dependencies, and then building the target application inside the container image for deployment.
These standard practices are problematic for a variety of reasons, which become clear when we think abstractly about the properties that we’d ideally like container images to have.
Desirable properties for container images
If, as suggested above, we look at container image creation through a packaging format lens, several desirable properties become apparent. Container images should:
- Be repeatable: we should be able to recreate an equivalent container image at an arbitrary time in the future.
- Have identifiable contents: so we can reason about license compliance implications and identify known security vulnerabilities.
- Feature up-to-date content: thus, lessening the chance that they contain known vulnerabilities and/or bugs.
Yet industry-standard practices for container image creation make it difficult to maintain these properties. They do this by:
- Performing non-deterministic operations (using RUN in the Docker file).
- Encoding a build-time dependent state in the image (e.g. through apt-get update).
- Inserting untraceable files into the filesystem (using ADD/COPY).
- Using an old base OS (using FROM).
These are common problematic patterns, yet there are many other ways in which the desirable properties identified above are made difficult to maintain. Maintaining these properties with the industry-standard workflows requires extreme diligence on the part of those creating the container image.
It was this realization that got us started on our investigation. There has to be a better way!
Not only are the three desirable properties we identified above difficult to preserve, but container images built with industry-standard mechanisms are also typically more bloated than they need to be.
The standard pattern—taking a base OS, installing an SDK, and then using the SDK to install the application—often results in images with far more content than is required to run the target workload. Such bloat leads to higher storage and bandwidth demands, more burdensome compliance obligations, and a larger attack surface. That, in turn, requires that vulnerabilities be tracked for potential impact in a larger set of components in order to determine when an image should be updated/replaced.
Linux distribution tools for building container images
Linux distribution tools have long proven useful when it comes to creating packages and images that preserve the three desirable properties we identified above. Yocto Project, Debian, Buildroot, and Guix are just a few examples. Furthermore, tools like these are already an intrinsic part of the container image supply chain thanks to their use by OSVs in creating base OS images.
Next time, we will review these tools for their potential as container image creators and point to enhancements that could make them more generally useful for image creation in the future.