agile App Dev Best Practices machine learning Modernization Best Practices Tanzu Labs

Agile Product Management and Machine Learning with Tanzu Labs

This article is for product managers looking to get a better understanding of how to manage machine learning products and projects. Its purpose is to enable product managers to better utilize existing software delivery frameworks as it relates to machine learning. The article is written for those who have a contextual understanding of machine learning. Please review the VMware Data Science Playbook for this context.

Agile software development is one of today’s most widely adopted software delivery frameworks. Its popularity stems from its immense flexibility in providing just enough structure to enable collaboration and accountability without stifling progress with too much overhead.

For product managers (PMs), agile enables the breakdown of complicated technical work into understandable and digestible chunks. It provides a model for predictability and agreed accountability within the team and offers flexibility within the software delivery process so that the product may be adjusted without risking months of work. For these reasons, agile is often seen as a catch-all for any software product team, or as the golden bullet of software delivery.

But does it work for machine learning products?

The answer, as with most answers in life, is yes and no. Just as any software delivery effort, the behaviors that agile promotes, such as collaboration and accountability, are critical to a successful machine learning delivery. However, as opposed to traditional software products, agile’s structure is lightweight and flexible, agile often becomes too heavy-handed in machine learning products. So let’s first talk about what makes machine learning so much different than traditional software delivery.

A tale of two delivery teams 

If agile works well with traditional software products, why would it not work just as well for machine learning products? At its essence, isn’t machine learning just another flavor of software? Yes, but with higher risk. Inherently, data science projects call for less familiar work, more specialized technical needs, and more trust in the engineering team. To demonstrate the differences, let’s make up a fake soda company: Super Soda.

Imagine a software application that predicts Super Soda sales for local grocery chains so they can stock the appropriate inventory. We want to improve this algorithm because there is too much overstocked Super Soda product in some areas and too little stocked product in others. To achieve this improvement, we fund two teams: one consists of a team of diehard software developers and the other, a team of diehard data scientists. Both teams will be managed by agile PMs. The teams’ goal is to accurately predict how much Super Soda product to stock in individual grocery stores and prove that their own engineering discipline is better. Whoever delivers the most accurate predictions for individual grocery stores wins the race.

On your marks, get set…

The race begins and both teams start at the same line, their first task is to understand the problem space. In order to build a prediction model for Super Soda sales, they must understand what drives those sales. So the software developers and data scientists will both need feedback from experts such as salespeople, store managers, end users, etc. 

Here is where agile benefits both parties equally. With agile, constant communication with stakeholders is not just encouraged but built into the structure of a weekly sprint. Every engineer knows their stakeholder by name and has both the ability and willingness to reach out with questions. With the help of their designers, they conduct thorough user interviews to better understand what drives Super Soda sales.

As it turns out, many things contribute to sales: the weather, population, demographic metrics, broad strokes cultural trends, presence of sporting events, and more. The list is endlessly long; and that’s just what their stakeholders can think of off the top of their heads. The two teams may not have a total understanding of all the factors, but they hope they’ve captured enough of the big ones to get started.

Their paths split

In a real race, both teams would run along the same track to the same finish line. Our metaphorical race breaks all these rules (calling into question whether it’s a race at all). Because now, our two teams have their first deviation.

The software developers dive into mathematical models. They start with what salespeople already use to predict their regional sales and try to break that down further so that it becomes applicable to individual grocery stores.

Here, the math of it gives way to a little bit of art as they once more consult experts on insights not obvious through number crunching. They may even consult financial analysts to figure out how they calculate their own sales predictions for projections. What they find is a combination of know-how, intuition, math, and plain guessing that they now need to combine and further hone to generate their software prediction.

The data scientists dive into the data. They start with historical sales data broken down by each grocery store and try to map out all the factors they have on top of this data. Ten years ago, what was the weather like? What sporting events happened? What was the population of the area? Matching that with sales numbers and they now have a complete set of data which they can now use to train their AI model.

But what training algorithm to use? What additional pruning of the data do they need? What parameters should they configure? Now, they go through their experimentation phase of answering each of these questions in different combinations to see which model paired with which dataset paired with which parameters produces the optimal prediction results verified by historical data.

How agile affects both teams

For our team of software developers, an agile approach makes a lot of sense. The PM breaks up their equation into piecemeal parts, each sprint adding in feedback to improve the overall algorithm. While estimating the work is always hard, the PM still manages a level of predictability to plan and report around. At every sprint, the team produces some meaningful software to show stakeholders who happily record their forward progress.

The first week, the software team iterates upon the algorithms that salespeople already use. The second week, they add in insights from grocery stores. The third week, they include additional equations from financial analysts and other data sources. By the month’s end, they have a fully functional algorithm that takes in insights from multiple sources to improve the accuracy of the prediction.

For our team of data scientists, agile breaks down. When the PM tries to compartmentalize the work, they end up not with business-value deliverables, but with distinct non-deliverable phases. However the PM tries, they cannot get the data scientist to commit to an estimate for each of these phases. Sprint after sprint, the PM nervously watches as the team produces nothing but more experimentation documentation, ideas to refine their experiments, and developer-only datasets that cannot be used by the customer.

In the first week, the data scientists create their initial training dataset. In the second week, they experiment with various machine learning algorithms. Here, they generate some algorithms only to find issues in the data hindering their predictions. By the third week, they are back to pruning that initial training dataset and continuing their experiments. By the end of the month, they unfortunately conclude that no AI algorithm can be produced.

The race concludes

The PM for the software developers shares the state of their algorithm with stakeholders. They have additional opportunities to iterate the software forward to further enhance its accuracy. The stakeholders couldn’t be happier with their progress.

The PM for the data scientists reports that no algorithm was produced. Anything produced in the interim does not improve current processes. The stakeholders are furious. All the budget set aside, salaries paid, time spent, and it was for nothing. What happened?

The PM for the data scientists hangs their head and says that agile wasn’t implemented properly. Had they only kept the process more disciplined, they could’ve caught the issues earlier, redirected the ship, and produced a valuable result.

Did agile really make the difference?

Let’s look at why the project produced one result for the software developers and another result entirely for data scientists. One of the most powerful things about agile is that by iterating towards a product in valuable chunks, the team will constantly deliver something. No longer will there be worries about a team going heads-down for a month only to throw up their hands in defeat. If they are going to fail, they will fail much sooner.

However, going heads-down for a period of time is exactly what data scientists must do. There are certainly some pieces of work that can act as milestones, such as creating the training dataset or iterating through experiments, but those aren’t necessarily valuable deliveries. They are arbitrary milestones that may be used to inaccurately report progress. Just because a data scientist has gone through 4 experiments versus 10 does not mean they are any closer to achieving the result!

If data scientists were to be held too harshly to agile, the PM would introduce a slew of issues into the engineering process:

  • Data scientists deliver bad equations just to satisfy the PM – True story: once, a data scientist tasked with a prediction algorithm ended up providing an equation that simply outputted a rolling average just to satisfy their PM. For a while, this rolling average algorithm was a company’s AI Prediction Algorithm.
  • Data scientists rush through experimentation to meet arbitrary deadlines – The purpose of experimentation is to try a lot of things. If this is hindered by management, the data scientist will produce suboptimal results. With data science, iterations are much more difficult than in traditional software products. Improving or fixing a poorly built equation will be much more costly than squashing a few bugs in a traditional software product.
  • Data scientists must iterate forward, never backwards – One of the great things about agile is its incessant need to drive software delivery forward. Every iteration is built upon a previous delivery to create constant progress. But data science is not a straight shot toward the end. It is a looping path through a maze and going backwards is oftentimes just as necessary as going forwards to complete the maze. Some data scientists may opt to hide work from their PM as to appear to be making only forward progress and not backwards progress.

So what was the PM supposed to do?

Unfortunately, there isn’t a neatly packaged, heavily adopted, strictly compartmentalized golden bullet. The answer is that a PM must understand the general principles of machine learning and adopt their practices and reporting strategies around those principles.

A PM needs to be mindful of data science risks at the beginning of the project

Our hypothetical Super Soda PM for the team of data scientists went wrong at the very beginning of the project where a result was promised. The PM set up an expectation that their team would deliver an improved algorithm. While this may be a low-risk promise in more traditional software development, with machine learning, it is high-risk.

Instead, the PM should seek to de-risk early by identifying some common reasons why an AI project should not be undertaken:

  • The data is not ready – Data is oftentimes messy. Depending on its collection process, the data may be worse than messy, but incorrect. True story: once, a company wanting to predict manufacturing quality provided a massive dataset of manufacturing quality metrics only for data scientists to discover six months later that the people collecting this information have been doing it wrong for the past five years! The data was worthless.
  • The infrastructure is not available – Many do not recognize the infrastructure investments necessary for a smooth data science project. From data storage, processing, piping, and model training, specific supporting platforms are needed to ensure a smooth development platform for data scientists. If it becomes too difficult for a data scientist to work, then they may not be able to iterate through enough experiments in time to find the right algorithm.
  • The problem is not clearly defined – This is an issue for both data science and traditional software products, but more true for data science. In a traditional software product, the product can easily iterate towards a more refined problem statement later in the future. In data science, iterations are not so easy and may require the complete recreation of datasets and retraining of complex models.

Even de-risked, a PM should always set the expectation that data science projects may fail for no other reason than that it is infeasible with the resources on hand. While technically, this is a reason anything can fail, there are many more scenarios in data science where an AI algorithm is simply not possible.

A PM should be aware of what data scientists need to succeed

One of the most important things a PM can provide their data scientists is contextual understanding. Back to our hypothetical Super Soda PM for the team of data scientists: they did a very good job encouraging user interviews and stakeholder feedback. This is perhaps even more important in data science products than in other software products.

Data scientists must delve into the data and understand the story the data is telling. As with all datasets, there will be mistakes, discrepancies, general weirdness, etc. and a data scientist must use their contextual understanding of the space to correctly identify all these things. Then, they can clean their data to provide the training dataset for their machine learning algorithm. An incorrect understanding here will lead to incorrect filtering or cleaning of the data which may erase key patterns that the machine learning algorithm will have to identify.

The second most important thing a PM can provide is space. Experimentation is just that, experimenting. There is no guarantee of a result or predictability for agile processes. A PM must provide the space for data scientists to freely experiment while also determining when enough is enough. There are several factors that can drive this decision:

  • Evaluation metrics are enough – There are numerous evaluation metrics that a data scientist may use to gauge how well their model is doing. It is important to understand them and agree upon them early on. Experimentation should go on only until certain evaluation metrics are hit. All metrics are situational, but a few common ones to consider are
    • Accuracy – how right the algorithm is
    • Precision – how relevant the algorithm is
    • Recall – how many relevant outputs overall does the algorithm make
    • True – False Positives – how many positive predictions are actually positive
    • True – False Negatives – how many negative predictions are actually negative
  • Probability to improve the model is enough – It is often easier to get model improvements from optimizing the things at the beginning of the data science process (i.e., training data, cleaning, feature selection) rather than the things at the end. If the front is optimized as far as it can be, there will oftentimes not be considerable gains from further experimentation. Though this is all situational and should be a candid conversation between the PM and the data scientist.
  • Resource constraints are too high – Whether the resource is time or compute power, sometimes, a project is initially set up for some level of difficulty only for the data scientist to discover another level of difficulty altogether. The PM should have some gauge on the difficulties the data scientist is going through and whether the project is still feasible.

A PM should change their stakeholder reporting strategy

Progress in data science is not shown in working code or iteratively improved upon algorithms. In fact, sprint-based progress reporting is generally a trap that promotes anti-patterns and arbitrary deadlines. Our hypothetical Super Soda PM of the data science team fell into this very same trap, unable to accurately capture the progress that his team was making because progress was defined in iteratively delivered code.

Instead, for data science, progress can be reported in a few ways:

  • Data science metrics – The purpose of experimentation is to increase the accuracy of predictions. The same evaluation metrics used above to gauge whether more experimentation is needed can also provide stakeholders an understanding of the product progress. If appropriately contextualized, this can serve as a powerful tool to show progress.
  • Data science phases – While there are definitely times when a data scientist must go backwards to go forwards, much of their work can be broken down into distinct phases. Here are seven common phases: problem formulation, data availability, data preparation, exploratory data analysis, model training and tuning, data evaluation, and model deployment.
  • Data science experiments – Not as valuable for representing real progress, the number of experiments at least tracks the amount of effort being put into the project. Not all experiments are equal in effort, but this can serve as a gauge to assure stakeholders that work is being done.

A PM should be able to understand and interpret the results

Going back to our hypothetical Super Soda PM for the data science team, had stakeholder expectations been appropriately managed, the stakeholder would not have been as surprised by the result of the project. Even beyond the result, all the experimentation and discovery that the data science team conducted is innately valuable to the organization.

An organization without the right data to enable a data science project is likely experiencing this same fallibility in many more areas they would like to enable data science in. Understanding why the data does not work and providing recommendations to fix this data will oftentimes provide more organizational value than even the original algorithm the team was intended to create. Think of the dozens of other data science projects that would’ve stalled or failed had this team not dug into the data so deeply.

Even in success, a PM should be able to appropriately convey next steps for a machine learning model. If we go back to our hypothetical Super Soda PM for the data science team, here are some recommendations that could be given:

  • Scheduled retraining and reverification of the model – As sales data continues pouring in and our evaluation metrics are updated, we can use those metrics and data to push further optimizations into our model. If we do not, our model can also become out of date and fail to adapt to changing environments.
  • Context must be passed to end users – Our salespeople are experts in their field and will have a lot of opinions on the model recommendations especially if it goes against what they know. In order to make sure the model is trusted, we must pass appropriate context on what the model predicts and how to use it so that our end users will be willing to engage with our prediction.
  • Data delivery pipelines must be managed or built – In training the model, we grabbed historical data. In using the model, we must have present-day data. Thus, data pipelines will need to be built for grocery store sales, local event data, weather data, etc. into our model so that it can continue making up-to-date predictions.

All these recommendations involve not just a team of data scientists, but a fully balanced team involving PMs, product designers, software developers, and data scientists. The upkeep of the model is just as important and often just as complicated as building the model itself.

With all this complexity, why do data science at all?

Data science doesn’t fit neatly into many of our most trusted software management methodologies, and actually introduces greater risk than iterative development. So, why would anyone take such a deal?

The answer is that when the risk is great, so too is the reward.

In our hypothetical race, yes the software developers were able to produce a prediction algorithm, but that algorithm is more of the same. It is an algorithm that takes in how a salesperson, financial analyst, and individual stores predict sales today, and simply combines the three in some variation hoping to increase its overall accuracy. The algorithm they produce will suffer the same drawbacks as the algorithms they drew from. The same patterns, intuition, and guesswork are used.

Now let’s look at our machine learning algorithm. What a machine learning algorithm will do is comb through the historical datasets, run a million plus simulations, and create connections and patterns beyond anything a human would do. It can take in all the factors that we thought were important and determine if they actually were. It can take in many more factors we never even thought to consider so long as the data scientists had the time and freedom to add in the data. 

Perhaps Super Soda sales spike on Sundays not due to football, but because Super Soda is a staple of churchgoers. Such incorrect assumptions made by humans would be retained in the software developers’ equation but challenged in the data scientists’ equation.

We resort to machine learning algorithms when we want something better than we can produce with common human sense. While traditional software development can iterate upon our prediction algorithm and create efficiency gains, it will ultimately only ever be more of the same. Machine learning changes the game.

Learn more about data science

For more detail about how teams are structured around data science products, please review the VMware Data Science Playbook.