Architecture Technical

Machine Learning with GPUs on vSphere

The VMware team had a very productive summer when it comes to publishing studies of running Machine Learning workloads using GPUs on vSphere.  Here is a roundup of blog articles published in the last few months. 

This series goes over the various options for using GPUs with vSphere, and compares and contrasts the capabilities and benefits of each.

This article forms a short introduction to a more detailed performance study of the sharing of GPUs using NVIDIA GRID

Undergraduate students at the University of California, Berkeley participated in this project in collaboration with VMware to develop three real-world Machine Learning use cases.

This series show test results for different ways to share GPUs with workloads using the innovative technology from Bitfusion, which lets workloads access GPU power installed on other hosts

This article demonstrates how you can get the best of both worlds with virtual machines and Singularity containers – the benefits of container packaging with the ability to share GPU power amongst multiple workloads that virtualization can provide

vSphere as a Data Science Platform

There are a few themes that become clear when viewing these as a whole

  • Performance of Machine Learning workloads using GPUs is by no means compromised when running on vSphere.  In fact, you can often achieve better aggregate performance, i.e. throughput of many jobs, by running on vSphere vs. bare metal

  • A key benefit of running GPU-based Machine Learning workloads on vSphere is the ability to allocate GPU resources in a very flexible and dynamic way.  This can be done by using NVIDIA GRID technology to share a single GPU with multiple jobs on one host, or by using Bitfusion to marshal the power of many GPUs for one job

In summary, vSphere provides the ideal software infrastructure for running an enterprise-class data science platform.  You can always stay up to date on the latest from VMware by bookmarking the pages for Machine Learning articles and Machine Learning resources.