Projects

What Makes a Build Reproducible, Part 2

In part 1 of this blog series, we categorized three common levels of reproducible builds: repeatable builds, rebuildable builds and binary reproducible builds. While binary reproducible builds offer the most supply chain assurances and benefits, they are also the most difficult to achieve. In this installment, we’ll discuss why open source developers should care about binary reproducible builds and the positive effect they can have on the whole ecosystem.

While reproducible builds are important for all developers, they are especially essential for open source developers. This is because open source users tend to implicitly trust developers and their source code. Take, for example, the container ecosystem. It’s common practice to pull a base image provided by someone else and develop your entire application on top of said image despite not really knowing the development process of the image. Many open source projects also instruct users to install their binaries by simply running “wget <binary>” and most users are completely fine doing it this way. Some of this trust may stem from the fact that open source source code is by nature verifiable, and with distributed revision control systems like Git, it’s easier to detect code tampering. But between accessible source code and conveniently usable binaries, there are still build systems and the content repositories where produced binaries are stored that can introduce malicious backdoors or unauthorized tampering. Instead of having to blindly trust developers, reproducible builds provide a way to prove that a binary matches its claimed source.

How can we move toward binary reproducible builds?

The first step in moving toward universal binary reproducible builds is to understand the factors currently making it harder to achieve build reproducibility and subsequently, how to avoid these pitfalls. Current roadblocks include a lack of common understanding for what binary reproducible builds actually entail and the fact that development tools can sometimes work against us. For more information about what exactly makes a build reproducible, refer to part 1 of this blog series.

From a tooling perspective, not all tools were created with reproducibility in mind. Some tools were created in more innocent times when reproducibility was not considered and many tools were created simply to make getting started with a given language or ecosystem easier (instead of focusing on enabling reproducible builds). When tools are not inherently reproducibility-focused, it requires more work from developers to implement reproducible builds. From a purely practical resource perspective, this may or may not be feasible. Reproducible builds are harder to do the further up the stack you go and as scope and scale increase.

Once you understand the pitfalls that can hinder reproducible builds, it’s important to scope appropriately when you are refining your build process to be more reproducible. Instead of trying to solve this issue for an entire ecosystem, focus on the processes and artifacts you have direct control over (i.e., the binaries you produce/consume or your build process). If your project is more complex, it’s important to remember that achieving binary reproducible build status takes time and that incremental progress is better than no progress. The Reproducible Builds organization and website is an incredibly valuable resource that can help you understand specifics around reproducible builds.

As you refine your build process, consider adopting mechanisms that exist for your ecosystem(s). Some build tools are more well-served than others, but if your current tools don’t support reproducible builds, maybe there are others you can choose that do support them. Additionally, you can also file feature requests to let tool maintainers know that reproducible builds are a priority for their users. The more that tool developers know their users care about these things, the more they will prioritize implementing and supporting these types of features.

As a community, we can invest (time and resources) and collaborate on ecosystem tooling. Specifically, investment in working groups or collaboration spaces like Reproducible Builds that develop standards, produce tools and define specifications to enable products to achieve build reproducibility is worthwhile. Investment in other efforts like rebuilderd, a project that independently verifies the reproducibility of Linux distribution binary packages, is also a good step toward ecosystem improvement. The more rebuilders we can deploy, the better it is for the entire ecosystem. By pooling our resources and rebuilding across multiple systems, we can gain confidence in the security and reproducibility of software we depend on.

The reproducibility effect

Reproducible builds are the first step as we move toward engineering excellence and a more secure software supply chain — but why should we stop there? Once we understand the principles and motivations behind reproducible builds, let’s apply those everywhere: reproducible builds, reproducible build servers, reproducible infrastructure, etc. Let’s aim for full traceability and confidence in every part of the development cycle. If we verifiably derive as much of our pipeline and infrastructure from source code, we can require multi-party approval of changes and redeploy with ease.

The idea of reproducibility also ensures that we don’t need to keep anything for extended periods of time. Instead, we can simply recreate it when we need it instead of letting it sit while at risk for tampering. The idea of “short-lived everything” in the development pipeline means more secure software all around. Even if bad actors are able to gain access to a build system one time, they don’t retain access once the machine is dissolved. By regularly redeploying ephemeral environments and hosts, we can protect against persistent threats.

Applying consistent and reliable reproducibility principles to our software development pipelines and infrastructure is not just good for your software — it’s good for the entire ecosystem. It provides transparency as to how things are done, which makes it harder for bad actors to hide and it provides an audit trail that makes it easy to spot malicious actors when they attempt to intervene.

Stay tuned to the Open Source Blog and follow us on Twitter for more deep dives into the world of open source contributing.