Scaling a model in response to user demand is crucial for bringing a machine learning model into production. In this blog post, we follow up on our previous post by showing how to scale this model in production using Pivotal Cloud Foundry (PCF).
Pivotal Cloud Foundry makes it easy to scale an application using the command line interface (CLI) or the Apps Manager with no downtime. We utilize Apps Manager to horizontally scale out (spinning up new instances of our model) our application automatically utilizing PCF’s load balancer, which reroutes new requests to appropriate instances of our model.
Using the sentiment analysis analysis model we’ve built with Pivotal Greenplum and Python, we built a dashboard for analyzing live Tweets from the Twitter firehose. In order to demonstrate the scalability of our model, we simulate a load of over 100,000 Tweets per second and scale out our app to 10 instances of the app thus analyzing over 100,000 Tweets per second—nearly 15x the average total number of Tweets per second from all of Twitter!
Watch below for a demonstration of the entire process.
The application framework and scaling the model
We built a sentiment analysis model in Python using PL/Python with scikit-learn in Pivotal Greenplum and persisted this model as a pickle object. We then built a Python Flask application serving this model using an API, which we then deployed using PCF. In addition, we built a simple dashboard application using Flask showing live Tweet examples along with statistics regarding sentiment analysis.
Directly from the Twitter firehose which are sent via a POST request, our model analyzes lives Tweets—a random subset of these Tweets and their sentiment scores are displayed on the dashboard every 5 seconds. In addition, a separate Python microservice computes the number of Tweets scored per second and average sentiment which are also displayed on the application dashboard using D3.js. The following diagram demonstrates the entire architecture.
In order to evaluate the scalability of our model we use the load testing tool Locust to send POST HTML requests containing synthetic tweets to our sentiment analysis model. This framework simulates multiple “users”—each of which sends a request containing a batch of 1,000 Tweets to be analyzed at random intervals every 7 seconds.
We first simulate 100 users, sending approximately in aggregate 10,000 Tweets per second—a load that is manageable for a single instance of our application. However, in order to analyze significantly more data, we will need to horizontally scale out our application. In the Apps Manager, we simply spin up 9 more instances of our application while increasing the number of Tweets from our load tester to 1,000 “users” resulting in 100,000+ Tweets to be analyzed per second. PCF handles the load balancing for our application, so as the data increases PCF automatically routes each request to the appropriate instance of our app.
For more information on the application including the code, architecture, and a live demo, check out the GitHub repository. Of course, this is just an example and we can use this framework to scale other models for solving large-scale machine learning problems.