By Alexandre Courouble
The sustainability of open source projects often relies on a single person: the maintainer. Maintainers are often the only people with commit access to the source code, meaning they are responsible for the future of the project. For projects with a single maintainer, this could represent an important security issue. As industries of all kinds come to depend more heavily on open source code, this is a danger we need to be more aware of and have better strategies in place to address.
For example, we need to maximize the chances that the people we pick to be project maintainers are honest, trustworthy leaders. In this post, I’ll briefly share some research I’ve conducted in this area that points to some metrics we could use to select maintainers most likely to ensure a project’s long-term success.
First, it’s worth noting that this isn’t a just theoretical or fringe concern. We can point to a number of recent examples where people gained maintainer access to projects and then injected malicious dependencies into them. They do this for any number of reasons: for outright theft, for revenge or just for the LOLs. But the consequences can be severe, even when the project that’s hijacked is small and essentially dormant.
Take what happened late last year with the event-stream Node.js library. The project had essentially achieved its goals, so its maintainer wasn’t especially invested in sticking around. When a new contributor turned up out of the blue and expressed interest in taking over as maintainer, the original maintainer happily handed over the reins. But the new maintainer swiftly installed a dependency called flatmap-stream that allowed him to steal Bitcoin from a small subset of event-stream users.
The larger problem here was that event-stream was used by many other projects—it was in over 300,000 GitHub repositories and over 3,300 build packages. That’s not unusual for an all-but dormant project. Open source projects build on each other after all. But it meant that the malicious code wasn’t contained inside event-stream. It rapidly spread far and wide.
As it happened, the event-stream attack targeted a very small group of people, so its damage was limited. But things could be a lot worse next time. It’s hard to track these dependencies, so it’s hard to know which projects are safe to include in whatever work you are doing. The whole system is built on an edifice of trust. But how do we ensure that trust?
My graduate thesis explored this question. In examining how maintainers are chosen in a well-run and highly trusted open source project, it sought to draw conclusions about patterns in maintainer promotion that could either allay concerns or raise red flags that a potentially abusive maintainer was about to be given control of a project.
My research set comprised of 734 Linux subsystem maintainers and I wanted to see how they came to occupy their positions. What characteristics and attributes did they have in common? And could we build a model using those characteristics and attributes that could predict who might be next in line to be (appropriately) appointed to maintain any specific project?
I looked at several data points, including code written, previous commits and the code reviews that people submitted over a period of about a year. Then I created a footprint formula that allowed me to compare people’s contributions over time to their final contributor status. What I found was that people who ended up as trusted maintainers tended to be among the highest overall contributors of code to the project but had also gradually contributed less code over the year. Relatedly, they also spent an increasing amount of time on reviews.
The most performant pattern, then, was a specific combination of lines of code, commit and reviews contributed over time. This allowed me to observe that the most trusted maintainers gain their status in a particular way over a particular time period.
I don’t want to suggest there’s a perfect formula here for measuring maintainer risk. Good maintainers also have soft skills that are harder to measure. But in terms of guidelines for what to be aware of when looking for a new maintainer, the research suggests that it’s definitely worth favoring people who:
- Have made many contributions to the project in terms of lines of code, commits and reviews.
- Started out as heavy code contributors and then moved on to become significant reviewers.
- Have been with the project for longer than the average project contributor.
None of this is hugely surprising, perhaps. But it offers empirical support for what otherwise would just be a hunch and strongly suggests what an atypical maintainer succession will look like and, as a result, can help flag handovers that should be more fully reviewed. Clearly, it’s always worth being cautious about promoting anyone too quickly to maintainer status. People should first demonstrate that they truly care about a project and that they have deep knowledge of it.
Even when your project is mature, you still need to maintain the trust that’s been bestowed upon it. Every maintainer takes on the responsibility of making sure that their tool remains usable and sustainable into the future. But with a little forethought and attention to how good successors emerge, that’s very possible to achieve.