Warehouse workers with face masks during and after coronavirus pandemic
Uncategorized AI

AI/ML for Enterprise: Part 2 (Training & Inference)

Why is performance and multitasking important?

We’re all impatient these days, we want things fast and we want them all at once, especially when it comes to technology. In 2010, Intel even had an ad campaign about how watching the hourglass whilst waiting for something to load can lead to stress! They called it “Hourglass Syndrome” and I’m sure we can all relate.

But why is it that we hear AI/ML workloads in particular need more performance? Is it because we’re all so impatient and want the intelligent result faster? Is it because there is so much to process and the added compute power really is required? Well it turns out, it’s a combination of both.

So let’s take a look at two reasons AI/ML workloads require such a large amount of resources – Training and Inference. Today, I’m going to break down these two concepts in order to highlight the demanding requirements of running machine learning workloads.


The word “Training” in AI/ML is used in the same way we use the word for training in our regular lives, for example training for a new skill (data science perhaps!). You will want to look at a lot of data, in the form of study books, online content etc. You’ll read through this new training material and make sense of it using your brain. Once you’ve understood it, you can memorize the pieces you believe to be most important or maybe just memorize how to find it (which book, which page etc.)

Training in AI/ML works in a similar way. We use a neural network (see my AI/ML demystified video for a simple explanation!) to look at a vast amount of data, learn as much about the data as possible, ingesting everything you give it and putting it into a kind of database/data lake.

A good example of this is for supply chain forecasting. To forecast what people might buy from your online store, you will want to look at the history of everyone’s purchasing habits and predict based on trends. You need to know who bought what and when. This should be easy to get to from your order DB.

This type of forecasting can be done without machine learning, but to make it intelligent, you need to account for any other possible factors which might help the model. What was bought at the same time? How long is the lifespan of the product typically before it needs replacing? What was the weather like on that day/week/month/year? because if it’s raining people don’t tend to buy sunglasses for example. ML can utilize data in ways that humans just couldn’t, or wouldn’t think to.  This data will come from a variety of sources as you can imagine, so then just looking at your order books isn’t enough to give an “intelligent” prediction.

The more data sources you can provide, the more accurate the model will be at making predictions. Since this data-set becomes extremely large to process though, using CPUs limits the amount of data you can process simultaneously. GPUs however, can process data many many times faster and they do it all in parallel.


“Infer – verb – deduce or conclude (something) from evidence and reasoning rather than from explicit statements.”

Now you’ve been through the training part of the process, done all of your studying by taking these various data sources and you’ve trained a model using the process we talked about above to predict future purchasing habits. You now have an “intelligent” model, which knows how to sift through any similar data sources and apply the same intelligence.

Now it’s time to use all of that studying and apply the intelligence for the exam. Using inference, we can deduce or conclude what is most likely to happen next. In our forecasting example, we can now apply the trained model to new data, even with real-time data perhaps. The model knows what correlations and anomalies to look for and what typically works based on the training results.

The full database of information from all of these potentially real-time sources is really large though. Although we spend a lot of time running the training (a lot of businesses do this piece at night), we want a much faster response when doing inference. If we run the model on the entire data set every time, it will take far too long.

So inference requires the data to be more easily accessible. It adds structure to the data by essentially separating it into groups, which means the data becomes much easier for the model to find the answer.

I’ve seen this stage described as similar to the difference between a BMP and a JPEG for image compression. The BMP has to store the location and colour of every single pixel, which means you have a perfect image, but the size of the image becomes too large to deal with. JPEGs will however map the areas and the size of the areas and colour, which gives you a much smaller file size for the same quality of image.

As you can imagine, parallel processing or multitasking like a GPU can do is a big bonus for the inference stage and makes a dramatic difference.


For the more tech savvy, there’s a brilliant short video on the NVIDIA Developer YouTube Channel which quickly explains some of the functionality and even has a demo! Training and Inferencing with NVIDIA AI Enterprise on VMware vSphere  – “NVIDIA AI Enterprise is a comprehensive suite of software that’s optimized, certified and supported on VMware vSphere 7 and industry-leading servers. This end-to-end platform is designed to provide the easiest solution for enterprises to enable AI in their core data center. In this demo, we’ll show some examples of the features and capabilities of this stack, as well as some example workloads, such as multi-node AI training, as well as multiple inferencing workloads.

Thanks for reading this blog on Training and Inference, I’d love to hear of other examples and experiences you’ve had with Training and Inference for Machine Learning. Reach out on LinkedIn!