By Dr. Malini Bhandaru, Anna Jung, and Pramod Jayathirth
We recently got to play DeepRacer, the popular machine learning racing game from Amazon Web Services (AWS), in a version hosted specifically for VMware by AWS. DeepRacer challenges players to move a virtual car around a circuit as fast as possible by tweaking various parameters that determine how the car learns to improve its performance.
The game is intended to be a fun introduction to the practicalities of reinforcement learning, a machine learning paradigm where intelligent agents adapt their behavior in order to maximize their chances of achieving a specific reward in hard to codify problem spaces. All of the action takes place on the AWS platform and until very recently was kept within a proprietary wrapper (more on that in a bit), so Amazon’s interest in showcasing AWS as a machine learning platform here is pretty transparent.
For us, however, the appeal of playing around with DeepRacer was twofold. Firstly, we had already been looking to extend our collective hands-on experience with machine learning in general and with reinforcement learning in particular. The game offered a quick and enjoyable way to do that within a relatively constrained challenge. More importantly, though, our team has been reviewing the broad landscape of open source machine learning platforms and tools, looking to understand which features are especially valuable and where we might make the most impactful contribution. Playing DeepRacer helped us decide whether we were on the right track.
So what did we learn? DeepRacer first asks you to define aspects of your car’s performance and gives you a few simple reward algorithms to start out with. Then you get to play around with those definitions and rewrite the reward functions to try and complete the race circuit with greater accuracy and speed. Your car wins by staying on the virtual racetrack while going around it faster than anyone else’s.
Making progress, we discovered, was both challenging and addictive. We were essentially teaching an intelligent agent how to drive from scratch and there were a lot of metrics to consider. Defining the reward functions – deciding, for example, whether to prioritize speed over steering radius or staying on track, and if so, by how much was non-trivial. Still, our two teams made it to the game’s final round and we reduced our circuit times from over 30 to 40 seconds in the first round to under 12 seconds by the time we were done.
It’s easy to see how some folks end up obsessing about getting better and better at this one particular task. But we were more interested in the broader insights we could draw from the experience. For one thing, it brought home just how important it is to write good reward functions. It also underlined the amount and complexity of data tracking required in machine learning. We had to track, for example, which changes improved our times (or set us back) on which of many, many trial runs. Then there were data tasks that the game took care of, like deciding how and where to save both our results and our models, that we nevertheless noted, deepening our understanding of machine learning operating practices (aka MLOps).
Perhaps most significantly, the experience helped us both appreciate and think more critically about the many machine learning tools and sub-projects that are bubbling up in the open source community and ask what might still be missing. Do we have the best possible machine learning project tracking tools available to us, for example? And how can what’s currently on offer be improved?
In addition, the game made us think about adjacent open source services that could help projects run reinforcement learning tasks more successfully, like better ways to search for, combine, and share reward functions among users.
Just a few weeks ago, anyone playing DeepRacer had to play it within an AWS-proprietary framework, even though it was built on an Ubuntu-based computer powered by the open source Robot Operating System. Needing to use AWS resources to store, test and train our models shed a light on just how problematic it is to be locked into any specific vendor when running machine learning tasks. Interestingly, AWS seems to have come to the same conclusion. A couple of weeks ago, Amazon announced it was open sourcing the DeepRacer game and the attendant competition built up around it.
The final takeaway from our gaming experience was that it confirmed our interest in contributing to two open source ML projects in particular: ONNX and KubeFlow.
ONNX is an open format for representing machine learning models that is framework and hardware platform agnostic. So you can build a model that learns in one framework or cloud and then deploy it on any other. That kind of flexibility is clearly foundational for open source users looking to exploit the power of machine learning while retaining the maximum flexibility to run, track, save and refine their machine learning applications whenever and wherever they want – something we’re keen to support.
While playing DeepRacer, we needed to track vehicle capabilities like speed and turning angle, reward functions, model training parameters (hyperparameters) and the models themselves. For non-reinforcement learning tasks, we also have data collection, transformation, labeling and splitting into test and train subsets, and (ideally) monitoring deployed models for drift. Our interest in KubeFlow and its sub-projects lies in its support for these common aspects across all ML. Being able to specify and automate these steps, akin to what’s required with continuous integration/continuous deployment (CI/CD) in software development and deployment will help democratize ML and make it more reproducible..
Thanks to spending many late nights obsessing about sending a small, self-learning car around a virtual track as fast as we could, we’re ready to roll up our sleeves and make some targeted — and with luck, impactful — contributions to the development of open source machine learning. It all begins with tackling a “good-first-issue.” We’ll let you know how we get on.