Measuring Constructiveness and Inclusivity in Open Source

Part 1 of this blog series discussed the importance of identifying constructiveness and inclusivity in open source and Part 2 explained the data representation techniques used. In this last part of the series, we will focus on training models to predict constructiveness and inclusivity.

Prediction Models

With the data tagged and transformed, we were ready to train the model. We experimented with a variety of machine learning models along with different data representations to determine the best model to predict constructiveness and inclusivity from the input data. Based on our hypothesis of the importance of measuring back and forth communication, we concluded that a 3-dimensional matrix data representation discussed in Part 2 would yield the best results. Similarly, computer vision models also use 3D matrices to represent images to build machine learning models. Due to how the data and images are represented, we felt that Convolutional Neural Networks — state of the art for computer vision — would also perform well on our data.

Table 1: Predicting Performance – Comparing Results Between Algorithms (The table indicates the performance in predicting “constructive” and “inclusive” labels for all data representations and machine learning algorithms tested. The “No Fit” indicates that the model did not learn and predicts only one label.)

The results highlighted validate the utility of our novel data representation for conversations as the machine learning models trained on that data representation show significantly higher accuracy. Model metrics for the 2D and 3D matrices are very similar, so additional training is required to conclusively determine the best data representation. Overall, we were also able to train Convolutional Neural Networks that can predict constructive and inclusive labels with high accuracy (80% and 90% respectively).

Understanding the Prediction

Ultimately, we want to enable contributors to be more constructive and inclusive in their feedback. These insights can be obtained through an analysis of how and why the machine learning models make a prediction. Using Tensorflow Explain’s Smooth Gradient, we combine the node activations and gradients from each layer of the neural network for a particular input to obtain the output contribution of each input.

For example, the model predicts pull request #86 of Kubeflow KFServing to be constructive. Below is how the model analyzed a comment within a pull request:

“Currently ValidateUpdate ignores the old object and just calls Validate—which equates to: ‘you validate yourself’… If we change that (and I agree it will be nice not to have to), we will have to add a method on the interface that checks if the fields that changed are mutable by passing in old and new to validate… so in that sense an extra fn.. so going to keep the method as it is for now.”

– Comment on Pull Request #86 of Kfserving project by Kubeflow

Analysis of the comment for Constructiveness

Based on the analysis, the phrases “we changed that,” “by passing in” and “that changed are” contributed positively to determining constructiveness. Similarly, we can also analyze phrases important to inclusivity.

Current Shortcomings

In this experiment, our above implementation seems to determine and analyze constructive and inclusive communication with high accuracy. However, we have only tested the framework and results for one small project due to resource constraints. In particular, labeling, tagging and transforming data is time consuming and labor-intensive. A broader sample size of projects and data labeled by people with diverse cultural backgrounds would reduce data bias and improve the models.

Future Work

In this three-part blog series, we discussed the importance of constructive and inclusive communication in open source, devised possible representations for conversational data, and developed models to predict and understand constructiveness and inclusivity. While the current implementation requires annotated data for each project, further work in transfer learning could make the models extensible to new projects without the need for newly annotated data specific to that project. The current framework can also be combined with generative models like GPT-3 to build an application that nurtures constructiveness and inclusivity in real-time just like Grammarly nurtures proper grammar.

If you’re interested in collaborating on this project, come join us at vmware-labs / ml-conversation-analytic tool on GitHub. We welcome all contributors!

Prediction Models

Understanding the Prediction

Current Shortcomings

Future Work

Related Articles

Reflections on the Open Source Maintainers Report 2023

Amusement Parks and Shipping Ports: CNCF Ambassadors Tell All

Deprecating an Open Source Project, Part 2