Features

Best Practices for Building Compliant Containers

By Nisha Kumar, open source engineer, VMware

As things stand, we do not have a fully declarative way of building containers. We can’t, in other words, make a specific list of software components that we want in a container image and build the container to match exactly that list. In manufacturing, where I used to work, you start with a Bill of Materials (BoM), send it through your product pipeline as you build the product and when the product is completed, you know exactly what went into it because you have the original BoM to refer to. We do this in software delivery, too, with build and release – we start with a list of software components that we know will go into the finished package and can refer back to it as needed.

But when it comes to containers, we have no way of actually building a container from something like a Bill of Materials. What we have are some kernel features that enable us to create files as we normally would on a native OS (i.e. using the OS’s package manager or downloading from the internet). This is how we usually get our apps to work on our workstations, so of course, this feels familiar. It is also ad hoc and against software delivery pipeline best practices. You wouldn’t wget | bash from some random website during your build and release process, would you? (Hint: that’s a rhetorical question).

Today’s container build tools range from the completely imperative, like Docker and Rkt, to the somewhat declarative, like ansible-container, Bazel and Smith. For a comprehensive list of container build tools, I would refer you to Michael Ducy’s excellent talk at KubeCon 2017: https://youtu.be/5D_SqLv92V8. Most container images out there, however, are built using “docker build” with a Dockerfile, so for most containers, we can’t be sure of what they contain.

Why should anyone care about this? Let’s assume you are a good open source citizen and want to ensure that you are not violating a software component’s distribution license. How can you make sure that you are in compliance if you cannot even tell what you are distributing? I won’t go into the details of what would happen if one were to ignore a software component’s license obligations, as I am not a lawyer (hint: lawsuits are bad).

Compliant Containers

Now that you know the pitfalls of building containers imperatively, you can take steps to build more compliant containers. Essentially, when building containers, you should try not to deviate from your regular build and release pipeline. You need to treat container building like any other deliverable, and not an afterthought. Here are some basic software delivery best practices that you should be following:

  • Mirror your sources
  • Know your Bill of Materials (BoM)
  • Tag source code that goes into a build (know your BoM)
  • Build from source, as much as possible
  • Identify proprietary binaries, their versions and build numbers (know your BoM)
  • Document and version control your build stage
  • Include human checks and promotions for builds
  • Document your releases

For containers, the only extra thing to consider is that they are meant to be ephemeral. That means they are not meant to be updated, only destroyed and recreated. Keeping that in mind, you should:

  • Formulate a container lifecycle strategy
  • Don’t ‘run’ anything within the container, as much as possible
  • Have complete knowledge and control of your build stage, as you generally would, if using a container as a build stage
  • Save your build stage
  • Create new containers when a bug/security/hotfix is needed
  • Document patches, security fixes and hotfixes along with your replaced container
  • Be transparent about your container lifecycle strategy

If you do this, you will know what you are putting in every container you make and be able to share the information with whoever needs it, meeting your provenance and compliance obligations. In a larger sense, you will build trust in the pipeline – your customers will trust you more to deliver what they asked for.

The bottom line: generating clean, accurate metadata about the software that goes into a container is just as important as the container itself.

Be sure to stay up-to-date with our Open Source Blog and follow us on Twitter (@vmwopensource) for more open source deep dives.