Features

The Challenge of Establishing Compliance for Containers – and a Few Ways to Do It

By Nisha Kumar

Last time I wrote about a thought-provoking diversity and inclusion pre-conference event that I attended at All Things Open last month. I attended that conference both to learn from my colleagues and to give a talk of my own on my work on Tern, and I thought I’d share a few of the key points from my talk here.

The challenge with automating OSS compliance for containers stems from the fact that containers are built in an imperative way, despite there being build scripts and Dockerfiles for individual containers and configuration management for container orchestrators. After all, build scripts are not build manifests. Software build and release best practices involve declaring where we are starting from—a list of ingredients if you will—that are needed to create software. We perhaps have some declaration of assembly pieces for a container image, but we have no information about those pieces.

So, what can we do? The sad thing is that there isn’t any straightforward way of getting around this issue. But there are some clues in the container image’s metadata and in the image itself that one can use to figure out these ingredients.

To start with, you can simply run docker history, which will give you some basic clues about what the container has in it and where the files came from. However, the information here is limited as most containers are built on top of existing containers, removing any information about the provenance of the individual containers.

To get more information, you can run docker save. This converts the image on disk into a tarball which you can untar into a folder. In the untarred folder, you will find a manifest.json file. You would be right in thinking that this is the container image manifest. However, on further inspection, you will realize that it does not have the specific information you will need for OSS compliance (i.e. the software that was installed in the container image). The manifest will contain the name of a config.json file that will have quite a lot of useful information about the individual layers that make up the container image, including the path to the files in each of the non-empty layers. From there you can scan the files using static analysis tools to get the compliance information you need. You can even use the security scanners you may have in use at your organization.

This is all fine and good, but it doesn’t offer you the context that you will need in order to understand and fix your container build and release pipeline. For example, if you used a starting image with one version of OpenSSL and you updated it in the second layer, you are now shipping two versions of OpenSSL in your image. A static analysis tool will not tell you how you ended up with two versions of OpenSSL. That is for you to figure out using all the debugging methods I had talked about.

But Tern can automate much of these steps for you. I’ve written about Tern on this blog before, so I won’t go into great detail about it here. Check out this post in particular to learn more about where it came from and what it does. Essentially, Tern inspects a container image and finds the metadata of the packages that are installed in it. This information can be used to solve some basic container compliance issues created by container build tools.

I concluded my talk with a question that’s worth repeating here: why should you care about open source software compliance? The answer is fairly simple: because your customers care. They will want answers to the kinds of questions that Tern is designed to help you answer.

In a way, this reflects the tremendous success of containers. Users of containerized enterprise services don’t just care about whether the software works; they care about the ecosystem around the software as well, such as IP auditing, software provenance and traceability. We cannot answer these questions unless we have deep knowledge about the containers that we are distributing. We cannot obtain this deep knowledge without doing the due diligence of keeping track of the software components that go into the containers we distribute.

Stay tuned to the Open Source Blog for more on container compliance and be sure to follow us on Twitter (@vmwopensource) as well.