By Nisha Kumar, Open Source Engineer, VMware
The benefits of the microservice architecture for web applications and platforms have VMware’s customers demanding support for containers. Container build tools like Docker make it easy to build and deploy a microservice. Simply choose a base container to start off with (easily available on Dockerhub), install your web application and its requirements, and deploy!
Not so fast: There is no documentation on what is installed in the initial container downloaded from Dockerhub. There is also no control over what gets installed on top of it.
Why does this matter to VMware? We have an obligation to follow the license requirements for any open source components that are installed in a container that we ship to a customer. If these open source components require the source code that was used to build the software components, we must provide that source code. If we have no idea about what is installed in the container, we cannot meet these obligations. Ignoring these obligations does not follow VMware’s standards for integrity and customer focus and hinders our efforts to be good open source citizens.
Hence, an effort was made to automate the process of finding the software components that are installed in a container and create a “Bill of Materials” along with extracting the sources that those packages were built from, using the installed package management system.
We also wanted to take an extra step and open source this effort. We understood that this is a common problem faced by the open source community, as well as other software businesses. Our goal was to make a tool that would be useful to the software community at large – helping its members understand container compliance and take the next step in building compliant containers.
The outcome is Tern, a tool to analyze Docker containers to find the software packages installed, and to help developers identify changes to the Dockerfile to make more compliant containers. Tern does this by parsing a Dockerfile to first inspect packages that are installed in the base container image (the FROM line in a Dockerfile) and then try to extract package information from the install commands that are run on top of the base container (the RUN line in a Dockerfile)
Tern’s architecture has two sections: A Command Library and a Cache. The Command Library is a knowledge base of software install commands for base OS images (typically containing the OS’s package manager tool commands) and for application package manager commands (like maven, pip and npm). The Cache contains known filesystems and packages that they contain. Tern’s executable uses the container image to extract package metadata by using known package managers, unknown software install scriptlets and the cache for anything it has seen previously. Tern produces a report detailing which line in the Dockerfile brought in what packages and provides hints as to what can change in order to build a more compliant container.
Tern is now evolving toward being a generic container analysis tool. This work is being done to meet the container community where they currently are. There is a push to minimize the size of the container image by using only the dependencies that a specific application needs. This means Tern must be equipped with a variety of methods to approximate what packages are installed in any container image. Currently, development is headed toward decoupling Docker specific operations so that the tool can more easily meet a wider set of users.
Tern’s source code can be viewed at https://github.com/vmware/tern. We welcome discussions about, and contributions to, the project. Feel free to start a conversation with us at https://groups.google.com/forum/#!forum/tern-discussion. Stay tuned to the Open Source Blog and follow us on Twitter (@vmwopensource) for all the latest around open source projects.