GPGPU Computing in a VM

Periodically on forums I see people asking whether GPGPU computing is supported by ESX, prompting me to write a brief post on the subject:

YES! VMware ESX supports GPGPU computing. Applications can access GPGPUs via CUDA or OpenCL in exactly the same way as when running natively — no changes are required to the application.

Recent versions of ESX (4.0 onwards) support a feature termed VMDirectPath IO. VMDirectPath IO allows guest operating systems to directly access PCI devices & this feature can be used to achieve GPGPU computing in a VM. As one would expect with direct communication, the performance overheads are minimal, providing close to native performance.

The process for setting-up access to a GPU from a VM is fairly simple:

(1) Requires a VT-d capable system (or system with AMD IOMMU)

- Nehalem class processor or later

- VT-d performs the remapping of I/O DMA transfers and device-generated interrupts needed for safe operation in the virtualized environment

- Ensures guest isolation

(2) Requires a “direct passthrough” capable GPGPU

- Most NVIDIA Quadro cards are passthrough capable

- Recent AMD cards are also passthrough capable

(3) Ensure I/O MMU is enabled in system BIOS

(4) VMDirectPath IO can be enabled via vSphere client or ESX command line

- Enable GPGPU as a passthrough device

- Add GPGPU to chosen VM

(5) Install vendors latest graphics drivers in the VM

(6) Download GPGPU SDK

(7) Run your CUDA/OpenCL apps!

[N.B. See VMware KB1010789 for more configuration details]

I’ve personally undertaken some testing prior to this post:

- Successfully tested with both Windows and Linux VMs (32 and 64-bit VMs)

- Successfully tested both CUDA and OpenCL frameworks on Nvidia Quadro GPUs

- Successfully tested OpenCL on AMD GPUs

Additionally, I spent some time investigating the performance overheads associated with performing GPGPU computing from within a VM. Given that once the data has been loaded into the GPU’s local memory and the processing operation initiated the GPU operates largely independently of the OS, if we expect to see any performance overheads, they are likely to be most significant when moving data back and forth between the VM and the GPU and initiating commands. Accordingly, I’ve spent time looking at these scenarios and found the following:

- Data transfer back and forth between host and GPU is close to native performance

Found to be true even for small data chunks (e.g. 1KB)

- Tested a number of short-duration microkernels from a wide range of application spaces and found no noticeable performance impact

DSP to financial to computational fluid dynamics (and beyond)

- VM performance observed at typically 98%+ of native performance

Operation close to native performance is not unexpected, as the direct communication been VM and GPU provided by VMDirectPath IO ensures the hypervisor doesn’t get involved in each and every interaction (which would add overhead), allowing even short duration operations to run close to native performance.

So, in summary, ESX supports GPGPU computing in a VM, at extremely close to native performance!