This is the last in a three part series based on the presentation on building container images that we gave at KubeCon/Cloud Native Con North America 2019.
In part two, we described two experiments that got us part of the way to establishing how we might create container images that are repeatable, identifiable, and up-to-date.
Based on the work that we’ve done, we can suggest some areas for future work that might take us to the holy grail of app-centric container images.
What are app-centric container images?
We have laid out in part one why we would want a container image with only the application’s build time or run time dependencies. In part two we illustrated how we can get there via packaging for the distro tools we were looking at. Here, we highlighted “guix pack” which only packaged an app and its runtime dependencies and nothing else. This method builds a container image starting from the app and working down to create a runtime OS rather than starting from the OS and building up to the app. The latter method is intuitive for any developer to pick up as it mirrors a development paradigm they are used to. They could just forklift that method and containerize their applications. This, however, is part of the reason why existing container images are bloated and full of security holes. Our vision here is to use the former approach when containerizing an application.
Prior work
Managing container bloat and layer dependencies has been a problem for years and many projects have tried to solve it in various ways. So far, nobody has figured out how to create an app-centric container image while at the same time maintaining build repeatability and keeping the dependencies up-to-date. Currently, there seem to be 3 approaches to this:
- Supplying a minimal base OS for developers to use in their Dockerfiles. Alpine has long provided this as well as Google Distroless. The trouble here is that these images cannot be traced back to sources and are missing a lot of metadata that can provide clues to their provenance. We have to trust the suppliers to keep the images up-to-date and maintain business continuity. They may not have any impetus to do so depending on resources and priorities.
- Stripping files post a docker build. Projects like Oracle Smith (now deprecated) and DockerSlim do this. As with the first method, the images cannot be traced back to sources and are missing all the metadata as it only focuses on the files required to run an app. There is no guarantee of build repeatability nor any way of keeping the images up-to-date.
- Control the container supply chain from the source and provide a client tool to query and build from the supply chain. Bitnami, Cloud Native Buildpacks and Chef’s Habitat offer these. These suppliers guarantee repeatability, compliance and keep their images up-to-date. However, the images are not minimal and often order components in filesystem layers. These suppliers may not guarantee high quality images unless you had a specific contract or agreement with them.
Actionable learnings
Here are some things that you can do right now in your organization to help build container images which better preserve the properties of repeatable, identifiable and up-to-date:
- Inspect your containers to see what is installed in them. There are several tools that help you do this. Dive helps you figure out your container image’s layout and what files it holds. Container-diff helps you figure out the difference between the files in each of your container image builds. Tern can help you build a Software Bill of Materials (SBoM) for known operating systems like Debian.
- Use one of the four tools that we’ve talked about, even if they aren’t perfect. DebOS in particular is straightforward to use and easy to pick up. You can use it to make both a base OS and the development tools and/or runtime dependencies that you need and then you will at least know the software bill of materials for your base OS. And Guix has great potential for creating container images that are just apps and their dependencies, but you still need to write a lot of glue code at present to get there. You can also upstream your changes to Guix and thereby make the user experience more seamless.
You might also be interested in our Lab notebook for the work we’ve described here and that will show you how we got to the results that we did. If it spurs any ideas on how we take this work further, we’d love to hear from you.
For more open source insights, follow the Open Source Blog and stay tuned to our Twitter channel (@vmwopensource).