Licenses are conditions for code use chosen by the creators of that code. If we don’t take care of these prior license requirements as we add to a project, we not only fail to credit the creators of the code we are using but we also impose extra risks on anyone wanting to use our new code. License compliance is both the right thing to do and minimizes the risks that we impose on both ourselves and our downstream community.
This is well understood in the more mature areas of the open source community. We know that when we distribute a build artifact, we inherit all of the license obligations pertaining to all of its software dependencies regardless of code origins. But what happens when it comes to containers? Although we’ve been distributing containers for only a relatively short time, their adoption has been so swift that license compliance has largely been ignored.
Unfortunately, that history of ignoring container compliance issues is now coming back to haunt us. With that said, let’s look at how containers typically get built. We still mostly build containers using Docker and then we automate that process with Dockerfiles:
- We download a container image from a container registry. That gives us a collection of tarballs containing files.
- We copy our app into the container. That app will have files that are probably the only ones that we know anything about.
- We run scripts to install dependencies, make modifications, build our binaries, etc.
- We upload our resulting container image, including the parts we downloaded, to a registry.
Crucially, we are now distributing not only our own files but all the files that we have downloaded. These almost always contain someone else’s code and binaries, and very often we won’t have any idea what their nature is. Yet when we distribute a container image, we inherit the legal obligations for every file in it, including the files that we downloaded from someone else.
Note that base images contain our app’s build time and run time dependencies. If our dependencies are statically compiled (in Golang, for example) that means that the dependencies’ code goes along with whatever we are distributing. And if they are under a copyleft license—say, GPLv3—then we need to comply with those obligations (i.e. provide corresponding sources).
All of this can quickly get even more complicated. Images can contain many, many files with many different kinds of licenses, installed in many different ways. Given all this, there’s really only one way to deal with your legal obligations: you must keep close track of what’s in your container. In other words, in order to meet your license obligations, you must inventory each layer in every container image you build. (Note: removing files is not going to help you because every layer is immutable and distributed with the rest of your container image).
Of course, this is easier said than done. And because it’s so difficult to keep track of files in any container image, we’ve developed some internal guidelines at VMware to help us get it right that are worth sharing.
Ideally, you would use a Base OS that you have built in-house from source, making license compliance straightforward. For VMware, this is Photon OS. Not all organizations can roll their own OS, though, so a good alternative is to use a “known good” OS—one for which you can reliably get a list of packages installed along with their licenses and sources. For example, Debian’s package manager can list installed packages and has a repeatable way of retrieving those packages’ sources. Check with your legal team on your choice of OS.
We then use build manifests to keep track of what we’re installing in a container. This can include pom.xml, requirements.txt, package.json, etc. Importantly, we do not use Docker’s non-reproducible multi-stage build functions. This is where Docker uses one container to build binaries and a second container to distribute the binary. It then throws away the build environment, leaving you with no way to inventory your build dependencies.
Lastly, we employ an inspection tool. You could use Tern, which is an open source inspection project that I maintain, or you could look at Dive or container-diff, which are used by many people with backgrounds in Kubernetes and Docker.
But here’s the bottom line: when we distribute a build artifact, whether it be an individual package or a container image, we inherit all of the license obligations for all of the software dependencies all the way down to the OS, regardless of where we got those binaries, the code or the container. Complying with license requirements is essential because it minimizes business and legal risk. But most of all, it is what we should be doing if we want to be good open source citizens.