Web/Tech Virtualization

Performance Comparison of Containerized Machine Learning Applications Running Natively with Nvidia vGPUs vs. in a VM – Episode 4

This article is by Hari Sivaraman, Uday Kurkure, and Lan Vu from the Performance Engineering team at VMware.

Performance Comparison of Containerized Machine Learning Applications

Docker containers [6] are rapidly becoming a popular environment in which to run different applications, including those in machine learning [1, 2, 3]. NVIDIA supports Docker containers with their own Docker engine utility, nvidia-docker [7], which is specialized to run applications that use NVIDIA GPUs.

The nvidia-docker container for machine learning includes the application and the machine learning framework (for example, TensorFlow [5]) but, importantly, it does not include the GPU driver or the CUDA toolkit.

Docker containers are hardware agnostic so, when an application uses specialized hardware like an NVIDIA GPU that needs kernel modules and user-level libraries, the container cannot include the required drivers. They live outside the container.

One workaround here is to install the driver inside the container and map its devices upon launch. This workaround is not portable since the versions inside the container need to match those in the native operating system.

The nvidia-docker engine utility provides an alternate mechanism that mounts the user-mode components at launch, but this requires you to install the driver and CUDA in the native operating system before launch. Both approaches have drawbacks, but the latter is clearly preferable.

In this episode of our series of blogs [8, 9, 10] on machine learning in vSphere using GPUs, we present a comparison of the performance of MNIST [4] running in a container on CentOS executing natively with MNIST running in a container inside a CentOS VM on vSphere. Based on our experiments, we demonstrate that running containers in a virtualized environment, like a CentOS VM on vSphere, suffers no performance penlty, while benefiting from the tremenduous management capabilities offered by the VMware vSphere platform.

Experiment Configuration and Methodology

We used MNIST [4] to compare the performance of containers running natively with containers running inside a VM. The configuration of the VM and the vSphere server we used for the “virtualized container” is shown in Table 1. The configuration of the physical machine used to run the container natively is shown in Table 2.

vSphere  6.0.0, build 3500742
Nvidia vGPU driver 367.53
Guest OS CentOS Linux release 7.4.1708 (Core)
CUDA driver 8.0
CUDA runtime 7.5
Docker 17.09-ce-rc2

Table 1. Configuration of VM used to run the nvidia-docker container

Nvidia driver 384.98
Operating system CentOS Linux release 7.4.1708 (Core)
CUDA driver 8.0
CUDA runtime 7.5
Docker 17.09-ce-rc2

⇑ Table 2. Configuration of physical machine used to run the nvidia-docker container

The server configuration we used is shown in Table 3 below. In our experiments, we used the NVIDIA M60 GPU in vGPU mode only. We did not use the Direct I/O mode. In the scenario in which we ran the container inside the VM, we first installed the NVIDIA vGPU drivers in vSphere and inside the VM, then we installed CUDA (driver 8.0 with runtime version 7.5), followed by Docker and nvidia-docker [7]. In the case where we ran the container natively, we installed the NVIDIA driver in CentOS running natively, followed by CUDA (driver 8.0 with runtime version 7.5),  Docker and finally, nvidia-docker [7]. In both scenarios we ran MNIST and we measured the run time for training using a wall clock.

 Figure 1. Testbed configuration for comparison of the performance of containers running natively vs. running in a VM

Model Dell PowerEdge R730
Processor type Intel® Xeon® CPU E5-2680 v3 @ 2.50GHz
CPU cores 24 CPUs, each @ 2.5GHz
Processor sockets 2
Cores per socket 14
Logical processors 48
Hyperthreading Active
Memory 768GB
Storage Local SSD (1.5TB), Storage Arrays, Local Hard Disks
GPUs 2x M60 Tesla

⇑ Table 3. Server configuration

Results

The measured wall-clock run times for MNIST are shown in Table 4 for the two scenarios we tested:

  1. Running in an nvidia-docker container in CentOS running natively.
  2. Running in an nvidia-docker container inside a CentOS VM on vSphere.

From the data, we can clearly see that there is no measurable performance penalty for running a container inside a VM as compred to running it natively.

Configuration Run time for MNIST as measured by a wall clock
Nvidia-docker container in CentOS running natively 44 minutes 53 seconds
Nvidia-docker container running in a CentOS VM on vSphere 44  minutes 57 seconds

⇑ Table 4. Comparison of the run-time for MNIST running in a container on native CentOS vs. in a container in virtualized CentOS

Takeaways

  • Based on the results shown in Table 4, it is clear that there is no measurable performance impact due to running a containerized application in a virtual environment as opposed to running it natively. So, from a performance perspective, there is no penalty for using a virtualized environment.
  • It is important to note that since containers do not include the GPU driver or the CUDA environment, both of these components need to be installed separately. It is in this aspect that a virtualized environment offers a superior user experience; an nvidia-docker container in CentOS running natively requires that any existing GPU and CUDA drivers be removed if the version of the drivers does not match that required by the container. Uninstalling and re-installing the correct drivers is often a challenging and time consuming task. However, in a virtualized environment, you can, in advance, create and store in a repository, a number of CentOS VMs with different VGPU and CUDA drivers. When you need to run an application in an nvidia-docker container, just clone the VM with the correct drivers, load the container, and run with no performance penalty. In such a scenario, running in a virtualized environment does not require you to uninstall and re-install the correct drivers, which saves both time and considerable frustration. This issue of uninstalling and re-installing drivers in a native environment becomes considerably more difficult if there are multiple container users on the system; in such a scenario, all the containers need to be migrated to use the new drivers, or the user who needs a new driver will have to wait until all the other users are done before a system administrator can upgrade the GPU drivers on the native CentOS.

Future Work

In this blog, we presented the performance results of running MNIST in a single container. We plan to run MNIST in multiple containers running concurrently in both a virtualized environment and on CentOS executing natively, and report the measured run times. This will provide a comparison of the performance as we scale up the number of containers.

References

  1. Google Cloud Platform: Cloud AI. https://cloud.google.com/products/machine-learning/
  2. Wikipedia: Deep Learning. https://en.wikipedia.org/wiki/Deep_learning
  3. NVIDIA GPUs – The Engine of Deep Learning. https://developer.nvidia.com/deep-learning
  4. The MNIST Database of Handwritten Digits. http://yann.lecun.com/exdb/mnist/
  5. TensorFlow: An Open-Source Software Library for Machine Intelligence. https://www.tensorflow.org
  6. Wikipedia: Operating-System-Level Virtualization. https://en.wikipedia.org/wiki/Operating-system-level_virtualization
  7. NVIDIA Docker: GPU Server Application Deployment Made Easy. https://devblogs.nvidia.com/parallelforall/nvidia-docker-gpu-server-application-deployment-made-easy/
  8. Episode 1: Performance Results of Machine Learning with DirectPath I/O and GRID vGPU. https://blogs.vmware.com/performance/2016/10/machine-learning-vsphere-nvidia-gpus.html
  9. Episode 2: Machine Learning on vSphere 6 with NVIDIA GPUs. https://blogs.vmware.com/performance/2017/03/machine-learning-vsphere-6-5-nvidia-gpus-episode-2.html
  10. Episode 3: Performance Comparison of Native GPU to Virtualized GPU and Scalability of Virtualized GPUs for Machine Learning. https://blogs.vmware.com/performance/2017/10/episode-3-performance-comparison-native-gpu-virtualized-gpu-scalability-virtualized-gpus-machine-learning.html