The VMware team had a very productive summer when it comes to publishing studies of running Machine Learning workloads using GPUs on vSphere. Here is a roundup of blog articles published in the last few months.
This series goes over the various options for using GPUs with vSphere, and compares and contrasts the capabilities and benefits of each.
- Using GPUs with Virtual Machines on vSphere – Part 1: Overview
- Using GPUs with Virtual Machines on vSphere – Part 2: VMDirectPath I/O
- Using GPUs with Virtual Machines on vSphere – Part 3: Installing the NVIDIA GRID Technology
- Using GPUs with Virtual Machines on vSphere – Part 4: Working with BitFusion FlexDirect
This article forms a short introduction to a more detailed performance study of the sharing of GPUs using NVIDIA GRID
- Sharing GPUs for Machine Learning/Deep Learning on vSphere with NVIDIA GRID – Performance Considerations
Undergraduate students at the University of California, Berkeley participated in this project in collaboration with VMware to develop three real-world Machine Learning use cases.
This series show test results for different ways to share GPUs with workloads using the innovative technology from Bitfusion, which lets workloads access GPU power installed on other hosts
- Machine Learning leveraging NVIDIA GPUs with Bitfusion on VMware vSphere (Part 1 of 2)
- Machine Learning leveraging NVIDIA GPUs with Bitfusion on VMware vSphere (Part 2 of 2)
This article demonstrates how you can get the best of both worlds with virtual machines and Singularity containers – the benefits of container packaging with the ability to share GPU power amongst multiple workloads that virtualization can provide
vSphere as a Data Science Platform
There are a few themes that become clear when viewing these as a whole
Performance of Machine Learning workloads using GPUs is by no means compromised when running on vSphere. In fact, you can often achieve better aggregate performance, i.e. throughput of many jobs, by running on vSphere vs. bare metal
A key benefit of running GPU-based Machine Learning workloads on vSphere is the ability to allocate GPU resources in a very flexible and dynamic way. This can be done by using NVIDIA GRID technology to share a single GPU with multiple jobs on one host, or by using Bitfusion to marshal the power of many GPUs for one job
In summary, vSphere provides the ideal software infrastructure for running an enterprise-class data science platform. You can always stay up to date on the latest from VMware by bookmarking the pages for Machine Learning articles and Machine Learning resources.