By Alexandre Courouble and John Hawley
How do you know if your open source project is succeeding? Isn’t it just a question of counting users, contributors, commits or stars on GitHub? In our previous post, we looked at the impediments that currently stand in the way of creating solid progress metrics for open source development. This time, we’ll ask how we can start to make things better.
One thing to remember when thinking about open source metrics is that, until relatively recently, there was essentially no way to measure open source performance on a large scale. With that said, we should all give ourselves a bit of a break and acknowledge that we’re still in the early days. This is a good problem to have, in a way, since it means that open source has been a raging success that continues to mature. It also means that while we’re far from the only people thinking about the issue of metrics in open source, the territory of open source metrics is still very much up for grabs and ready to be defined by creative community thinking.
An emerging open source metric for Git code reviews: Email2git
It should be acknowledged that we likely won’t find our answers in solutions that only engage with GitHub. A lot of open source projects just use Git, including Linux. Here, a solution developed by Alexandre as part of his graduate studies points to how we might proceed. Alexandre was interested in tracking discussions around patches that reveal what developers are thinking when someone submits code. These contain valuable information that can be mined to reveal both quantitative and qualitative data.
So, what do you do if the project you are interested in doesn’t use GitHub and thus lacks the capacity to track code reviews? Alexandre’s solution was Email2git, which tracks code reviews coming from a commit point of view. Email2git is an open source tool that finds mailing list conversations that match specific commits in Linux. This lets you easily find any conversation, whether just to read it or to parse the text and then draw data out of it.
Email2git also allows you to capture contributions beyond straightforward code contributions. Linux subsystem maintainers, for example, can spend a lot of time reading patches or reading reviews – work that is every bit as vital as drafting the original code. Email2git makes that work visible and measurable. You can then draw out the data however you want: the number of interactions in the review, how many people were involved, etc. You can even apply sentiment analysis to the conversation to infer the value attributed by the team to a specific commit.
Let’s talk open source metrics at CHAOSScon
We will be talking about the general problem of visibility into open source data and offering a demo of Email2git at the upcoming CHAOSScon before Open Source Summit North America in Vancouver, Canada at the end of this month. CHAOSS, if you don’t know, is the Community Health Analytics Open Source Software project, and this year’s project is expected to have a particular focus on metrics.
We see Email2git as just one example of how we can improve data acquisition techniques for open source metrics. Really, what we want to do here and at gatherings like CHAOSScon, is keep the conversation going. What else do we need to we be talking about? From our perspective, we’d like to see better access to raw data. A standardized data structure would help a lot as well.
We noted in part one, however, that different kinds of open source creators and users will want to prioritize different kinds of metrics. So, we’re not going to claim that the metrics that we’d like to see made more solid are what everyone else should be working on. However, we clearly need to make progress as a community on this. If you are concerned about these issues and planning to attend CHAOSScon, please come find us and share your thoughts. Maybe we can begin to build the framework for an approach that will work for us all.
For all things open source, be sure to visit our blog and follow us on Twitter (@vmwopensource).