vSphere 6.7: Suspend and Resume of GPU-attached Virtual Machines

VMware vSphere is the Universal Application Platform of choice for deployment of all kinds of applications – from end-user virtual desktops to very large databases, machine learning systems and in-memory data management systems. vSphere 6.7 became generally available in April 2018. This article describes a set of use cases and a demonstration for a new feature in vSphere 6.7 related to GPU use in virtual machines.

The techniques described here were tested at VMware and at customer sites where they are deployed today. The new vSphere 6.7 feature that we focus on here allows the user to suspend a virtual machine workload that is NVIDIA Grid vGPU enabled on a host server for some period of time and then allows the same virtual machine to be resumed from that point at a later time. This enables the use of the virtualized GPU in different contexts, for completely different applications from machine learning to graphics processing, over time. An example of this would be using a set of GPUs for graphics acceleration on virtual desktop VMs during regular work hours and then using the same GPUs for machine learning virtual machines in the evening and night time hours.

Background

A GPU (Graphics Processing Unit) is a programmable logic chip (processor) that was initially designed for high performance graphics on personal computers. GPUs were initially built in order to render images, animations and video for computer graphics. High-end GPUs have thousands of processor cores compared to traditional CPUs that have tens of cores. GPUs be used for speeding up massively parallel mathematical operations and processing. For that last reason, they have become very popular in the fields of machine learning and deep learning.

Enterprise Uses of GPUs

Virtual Desktops

NVIDIA, in collaboration with VMware, has helped move advanced graphics functionality from the individual workstation to the data center, making immersive 3D graphics available to remote users via virtual desktops. This solution enables users with the most demanding graphics needs, such as 3D engineering applications, to take advantage of the superior computing, storage, and networking power of the data center while freeing them from the limitations of the physical workstation. It also provides enterprises much better control over where their sensitive intellectual property is getting worked on, by keeping it in the datacenter instead on employees’ personal devices.

General Purpose GPUs (GPGPUs)

A general-purpose GPU (GPGPU) is a graphics processing unit (GPU) that performs non-specialized calculations that would typically be conducted by the CPU (central processing unit). Some application areas include facial recognition in images, medical diagnosis in MRIs, robotics, automobile safety, and text and speech recognition. The use of GPUs for handling massive floating-point computation that can be parallelized on many cores is a strong trend in the high performance computing (HPC) virtualization and cloud area, particularly for accelerating deep learning workloads.

GPU Support in vSphere

vSphere Virtualization has been in the forefront of infrastructure optimization and sharing for traditional CPU based workloads. The use of GPUs in virtual machines has been supported since vSphere 5.5, through VMware vSphere’s DirectPath I/O technology, while NVIDIA GRID vGPU^TMhas been enabled since vSphere 6.0.

GPUs can be deployed on vSphere in two separate modes, each of which has its uses:

DirectPath I/O Pass-through -supported since vSphere 5.5

In this mode of using virtualized GPUs, the host is configured to have one or more GPUs in a DirectPath I/O pass-through mode. When using this method, only one virtual machine can use a GPU device at any one time. A virtual machine can have more than one GPU in pass-through mode. The GPUs work in this mode provided that the correct drivers exist at the virtual machine’s operating system level.

NVIDIA GRID vGPU™ Mode

The NVIDIA GRID vGPU enables multiple virtual machines to share a single physical NVIDIA GPU. To operate, NVIDIA GRID hypervisor driver is required in ESXi in the form of a vSphere Installation Bundle (VIB) and a guest Operating System driver at the virtual machine level for Windows or Linux VMs.

In this mode, vGPUs are defined using profiles that specify how much GPU memory to assign to a vGPU, from a partial vGPU up to a single whole vGPU that is equal in size to the physical GPU.

GPU Sharing

The two main use cases for GPUs in enterprises that we consider here, GPUs in virtual desktops and general purpose GPUs, have different usage profiles. Virtual desktops are typically used during the business day and GPGPU type of workloads can be batched and run during non-business hours. These different profiles provide the opportunity to share and optimize the GPU resources in the datacenter. To be able to share GPU resources across these disparate workloads with minimal disruption, the virtual machines with GPUs enabled need to have the ability to be suspended and resumed during different times of the day.

A new feature in vSphere 6.7 provides the capability to pause (i.e.“suspend”) and resume one or more virtual machines with their allocated GPU resources. In the first demonstration, we show the suspending of two virtual machines that represent virtual desktops.

Demo Video 1 – efficient use of GPU resources

The video shows the suspend/resume capabilities in a scenario where sharing of the limited GPU resources is taking place between two VMware Horizon desktop virtual machines (the end-user graphics use case) and two TensorFlow virtual machines (an example of the GPGPU use case).

The demonstration shows the use of a “GPU Profile” within the vSphere Client management console. The particular GPU profile chosen is the “grid-p40-12q” profile that allows a particular virtual machine to use half the resources of the GPU device. One scenario shown is a pair of virtual desktop machines each using half the GPU resources. When the GPU is fully used like this, other virtual machines that require it cannot be powered up. By suspending one of the virtual desktop machines, a completely separate virtual machine that uses the TensorFlow Machine Learning framework can operate and consume freed up half of the GPU resource.

Demo Video 2 – reduce maintenance hurdles

In this second video, produced by our partner, NVIDIA, maintenance induced end-user disruption is reduced for NVIDIA GRID vGPU powered infrastructure . In the video, a typical maintenance flow is carried out on NVIDIA powered VDI infrastrcture. In the past, these operations required complete end-user desktop shut down. Starting with vSphere 6.7 , end-user desktops can be suspended during maintenance and resumed upon its completion. This leaves the end user desktops in their original state, including any GPU processing, application and desktop state.

Conclusions

VMware vSphere, a leading universal application platform, is optimized to enable GPU workloads and to allow you to share those GPUs across different virtual machines. Customers use GPUs in their virtualized infrastructure for different types of workloads in different virtual machines today. The GPUs can either be dedicated to use by one virtual machine or they can be shared in different ways across virtual machines. One of those methods of sharing virtualized GPUs is to use the NVIDIA Grid vGPU approach.

We show the suspending of one or more virtual machines that are vGPU-enabled and the power-on of a different set of virtual machines to make use of the same vGPUs. This suspend/resume capability is a new feature that is available in vSphere 6.7. This capability maximizes the use of your vGPU infrastructure in order to save costs and get better efficiency from your physical GPU-capable infrastructure.

Co-authors: Mohan Potheri and Ziv Kalmanovich