I joined VMware’s open source team a couple of months ago on a mission to explore the machine learning open source landscape and help ensure VMware is always at the forefront of ML technology. Two of the early outcomes were an interesting ML build pattern and a particularly interesting Task named prometheus-gate. These components will both be part of my session at KubeCon + CloudNativeCon Europe 2020 later this month and I’d like to tell you a little bit about them.
Somewhere along the way, I decided the solutions to my current long-term interests all required machine learning, and so I made the (great) decision to entirely focus on open source ML.
A couple of things led me down this road. In my prior roles, I had worked more-or-less exclusively on PaaS (platforms as a service) and IaaS (infrastructure as a service) platforms, and the trend was that my work as an engineer was increasingly event-oriented. Gone was the static, manually configured infrastructure and in its place were responsive, autoscaling application platforms – and behind these operations were event-driven systems. Somewhere along the way, I decided the solutions to my current long-term interests all required machine learning, so I made the (great) decision to entirely focus on open source ML.
Enter COVID-19 Modeling
The other impetus was one anyone can relate to – COVID-19. As soon as I came on board, I became involved in internal predictive modeling around COVID-19 and I immediately ran into roadblocks. I needed a simple way to save and store trained models, one that I can ideally integrate with my existing Tekton Pipelines image building resources. I also needed to be able to share experiments with team members and try others as well. After experimentation, I came to the conclusion that BentoML was the best possible solution for these goals and worked to extend it to function in a CI/CD pipeline.
I had an additional, more advanced requirement to safely automate releasing new inference services, and to do that I decided to use KFserving and its canary deployment model. To do so, I needed a new component that could block or pause a Tekton Pipeline until certain criteria were met – something like the error rate for a deployment is less than 5 for 30 minutes. These criteria should define time-series data, and I am using Prometheus as it provides enough data and our application does not require high cardinality. We want to look for trends over a period of time and ensure they meet the SLO we have defined before proceeding, and this provides a good answer for how to automatically promote our KFServing canary.
To accomplish this, I created a simple “gate” app in Go which queries the Prometheus API with a defined range-query. Following this, I created a Tekton Catalog Task named prometheus-gate, which allows the gate to sit inside a Tekton Pipeline as a Task that awaits a certain criteria before proceeding. With this, I could assemble my simple system with BentoML.
Leveraging BentoML
BentoML is a really interesting project and works nicely for serving ML model APIs, building a continuous integration machine learning pipeline and sharing models with teammates. It provides a useful common pattern for many model types and provides a powerful CLI. I contributed the “retrieve” command to the BentoML CLI so that the CLI can be used to retrieve specific ML models and their build environments while inside automation, such as Pipelines. This helps make BentoML useful in a non-interactive session, such as a CI Pipeline.
When used in conjunction with Tekton Triggers, it’s possible to do a number of really interesting things. I proposed and helped create Tekton Triggers to provide a way to run Pipelines based on event payloads. It may sound simple, but the TriggerTemplates resource provides a way to create templated Tekton resources, which is deceptively powerful and something I intended to use to tie all these pieces together.
The Simple Inference Pipeline
The result is that you now have a container-based machine learning API exposed for your model. From a small http payload, you get a system that can respond by retrieving your machine learning models context, build it into an image, deploy that image with KFServing and automatically promote your canary safely using the prometheus-gate system to ensure SLOs are met first. Best of all, most of the Task components are already merged and available in the Tekton Catalog.
If you’d like to learn more about this idea and my thinking behind it, or see the demo I’ve built to show how it can work, check out my KubeCon Europe virtual presentation on August 19th.
And watch this space as our team thinks more about deploying machine learning into open source projects in other new and interesting ways.