The emergence of consumer facing AI models such as ChatGPT and MidJourney has brought the world’s attention to the mind-bending innovation happening when data science successfully plugs into software product teams.
When it works, it’s amazing. But that’s not always the outcome.
At VMware Tanzu Labs, we’ve been working with data and data science for a long time. In that time, we have formed some opinions on how to bring data science into product development teams. A reliable sticking point when data science meets product organizations is a failure to understand the unique pace of data science work.
This brings me to a story about a bear.
I was on a team that was building a data science model for a mobile product. While there were several directions we could pursue, my team was 100 percent focused on improving the accuracy of our predictions based on our goals. We ran many different experiments, analyzed the outputs, agonized over which change should make it into our production code and which should be dropped by the wayside. Every time we picked up a story in our backlog, we said, “Maybe this is it! Maybe this is the change that will get us to the results we want.” It usually wasn’t, but it gave us new ideas of what else to try.
I would go to meetings with the other teams in the portfolio—the ones building the mobile apps that our model would be incorporated into. Their status updates would be, “We just delivered this feature, we are currently working on this other set of features that will help us achieve this outcome, we found a couple bugs that we fixed, and our analytics show that this integration isn’t getting much use so we want to reconsider maintaining it.”
Mine would be, “We’re still working on the same problem.” Yes, I could tell them why, what we were doing to solve it, and what approaches we had tried, but the stakeholder-level bottom line was just that we were spinning our wheels.
After several weeks of this, I felt the need to explain why my status updates were different from my colleagues’.
“The difference,” I said, “is that your teams are hiking up a mountain, and our team is hunting a bear.”
Please note that I am a vegetarian and an animal lover and would never actually hunt a bear. But for some reason, that’s the image that popped into my mind.
“Your teams are looking up at your goal, and there are different ways of getting there and obstacles along the way. But it’s a known entity. It has an altitude, a latitude, and a longitude. You have to figure out what approach to take—the short, strenuous one; the long, relaxing one; the one with the best views. Now, there might be obstacles—muddy paths, fallen trees, flooded rivers. Or there might be weather conditions you have to factor in. But you have a loose sense of what it’s going to take to get up that mountain.
“I’m hunting a bear. I’m on the same mountain as you, but with a different goal. The difference is, I don’t know where mine is. I’m looking for it. Yesterday, we saw what might have been bear tracks, and my team went to look around the area, but they couldn’t find the bear. Hopefully, if we follow those tracks, we will be led to it; but we might not. In fact, there might not even be a bear. It could have wandered off to another mountain. I think, based on my experience and with the evidence we’ve collected so far, that there is one, and we should keep looking. However, there’s no guarantee of that like there is with the mountaintop.
“Mind you, I have a very clear goal. My equivalent of a bearskin rug—an accurate data science model. I know exactly what I want to do once I find the bear. I just don’t know when or where I’m going to find it, or if it’s even possible. But my team and I, we’re following the clues. We think we’re getting close.”
Everybody cracked up, picturing me as Elmer Fudd with a double-barreled rifle putting out traps. But I think they got the idea, and at least my status update was more interesting that week.
My colleagues and I at Tanzu Labs have partnered with clients across many industries on projects like this one, working with their data scientists to build and productize their most cutting-edge products. Along the way, we have seen the pain points that organizations face as they try to incorporate the discipline and tools of data science into their software practices. How do we tear down silos between data science and development? How do I write automated tests when I don’t know what the outputs will be? Can data science be agile?
Over the last few years, we have been experimenting with approaches, trying out new ideas, and circulating our findings with each other. Now, in the spirit of our Tanzu Labs objective to share our knowledge with the world, we put our heads together to capture the best of our practices for working with data science and put them into a single guide: Data Science and the Balanced Team.
To my software engineers, designers, product managers, delivery leads, and fellow nature lovers on Software Mountain, this guide was written to help you step off the trail and start hunting. Our goal is to walk you through the differences between working on a traditional software product and one that incorporates data science, show you how your existing skillsets apply to this type of undertaking, and provide you with some high-level information and data points to get you started. Whether or not you end up finding what you’re looking for, we want to equip you for a safe, productive, and fun journey through the woods.
Get ready, my friends. It’s data season.