I am the maintainer of a neat little open source project called Tern. It inspects a container image and finds the metadata of the packages that are installed in it. This information can be used to solve some basic container compliance issues created by container build tools. This previous blog post, along with my talk at the Open Source Leadership Summit earlier this year, explains why a project like this is needed for containers. Initially, though, I had a lot of misgivings about working on a project like this because of these four assumptions:
- Open Source Software Compliance is a boring problem.
- Open Source Software Compliance is easy to automate.
- Open Source Software Compliance Solutions must already exist.
- Open Source Software Compliance is a chore.
Not even 10 months into this project, I can confidently tell you that open source software compliance is NOT easy to automate. Here are some of the obstacles that Tern is trying to overcome right now:
- Tern has its own shell command parser, but in its current state it is inefficient and barely works. The parser is needed because containers are predominantly built by running shell commands that are either written in a Dockerfile or in a shell script. Tern’s shell parser parses a typical package manager command line interface to install or remove software packages and nothing beyond that.
- Tern faces a similar problem with its Dockerfile parser. This parser is needed because most containers are built using the ‘docker build’ command, which takes a Dockerfile. Dockerfiles now exist for every open source and proprietary project, all manner of microservices, FaaS and PaaS platforms, pretty much anything you can stick in the cloud, and soon, IoT edge networks. Tern’s Dockerfile parser does not understand the backslash overflow for RUN commands and has no support for things like VOLUME and WORKDIR. I was told there existed a better one, but I have yet to evaluate it.
- Tern has a database that maps filesystems to packages, which can be used to look up container filesystems and get all the packages that it contains along with all the compliance metadata. Right now, it’s just one YAML file. It needs to be a full-fledged database with the ability to reverse lookup filesystems for a given package. That requires some modeling and MongoDB, with which I have no experience.
- Tern needs some method of tracking where the current image ends and where the imported images begin. For example, Docker allows you to build on top of images hosted on Dockerhub using the FROM directive. There is currently no metadata that shows where an imported image came from. This problem should ideally be addressed by the container community, but since the focus is on things like orchestration, observability and security scanning, it’s pretty much ignored. It is still an interesting problem that Tern is trying to tackle right now. There is no implementation to solve this problem currently, but an issue has been filed.
- Tern is in the business of creating pretty reports that make compliance more approachable. Producing a tangible human and machine-readable report shows that the problem is a tractable one and makes it less scary. Formatting exception messages for different types of machines and humans is not an easy task, as demonstrated by the variety of parsing python packages out there.
Another one of my concerns was that existing compliance tools might negate all of the work going into Tern. This is a typical concern for anyone embarking on an endeavor. The other existing projects were older and better resourced. However, an experienced open source contributor and maintainer gave me some sage advice: don’t look at other projects as competition, but rather as lessons on what worked and what didn’t.
People have been tackling this compliance problem for decades using a variety of methods, like controlling the sources that are used to build packages, publishing manifests, putting SPDX license identifiers in source files, security scanning (a la BlackDuck) and making developers trace back where they got the files to build their fancy applications. On top of creating tools, organizations have also placed enforcement procedures all the way up and down the supply chain, within and across teams.
However, the tools and the processes are disparate. There is no one process for all software creation. There are paradigms that work in one space and not in another; workflows that work for one kind of artifact and not another. People and organizations have short-term goals and short attention spans.
In light of that, one of Tern’s guiding principles is to keep in mind that there is a whole universe of software and approaches that exist to accomplish a given task. With regards to containers, any and all of them are being used. Tern’s implementation tries very hard not to make assumptions about your workflow. Tern’s logging and final reports are verbose by design, in order to be as transparent about its operation as possible so that you are aware of how it behaves in the context of your workflow. For example, if you happen to pull a container image from Dockerhub using the ‘latest’ tag (not a good idea as ‘latest’ is not an immutable tag), Tern will make that fact known to you, but will not fail because of it.
By now, I hope I have convinced you that open source compliance isn’t really boring at all. It’s a nuanced problem, which makes it an interesting and challenging one to solve. What about practicing open source compliance? Personally, I have noticed that the developers I admire are the ones with the belief that if one starts a task, one ought to leave no trace of oneself in the environment they are working in, whether it be themselves or the software they write. This is exactly why containers were created in the first place—to isolate a process so it doesn’t mess with anything it shouldn’t. Jessie Frazelle is one developer celebrity who subscribes to the attitude of good software hygiene. Her blog about using containers as sandbox environments for her desktop application is widely shared in the container community.
Why would anyone bother to make more work for themselves in this fashion? Because good software hygiene provides a good programming experience in the long run. It’s the software equivalent of making your bed or doing your dishes. It’s not necessarily fun, but it does make your life much more enjoyable in the aggregate. If all of this has you interested in Tern and the many problems it is tackling, you can check it out here: https://github.com/vmware/tern.