This article is by Hari Sivaraman, Uday Kurkure, and Lan Vu from the Performance Engineering team at VMware.
Performance Comparison of Containerized Machine Learning Applications
Docker containers [6] are rapidly becoming a popular environment in which to run different applications, including those in machine learning [1, 2, 3]. NVIDIA supports Docker containers with their own Docker engine utility, nvidia-docker [7], which is specialized to run applications that use NVIDIA GPUs.
The nvidia-docker container for machine learning includes the application and the machine learning framework (for example, TensorFlow [5]) but, importantly, it does not include the GPU driver or the CUDA toolkit.
Docker containers are hardware agnostic so, when an application uses specialized hardware like an NVIDIA GPU that needs kernel modules and user-level libraries, the container cannot include the required drivers. They live outside the container.
One workaround here is to install the driver inside the container and map its devices upon launch. This workaround is not portable since the versions inside the container need to match those in the native operating system.
The nvidia-docker engine utility provides an alternate mechanism that mounts the user-mode components at launch, but this requires you to install the driver and CUDA in the native operating system before launch. Both approaches have drawbacks, but the latter is clearly preferable.
In this episode of our series of blogs [8, 9, 10] on machine learning in vSphere using GPUs, we present a comparison of the performance of MNIST [4] running in a container on CentOS executing natively with MNIST running in a container inside a CentOS VM on vSphere. Based on our experiments, we demonstrate that running containers in a virtualized environment, like a CentOS VM on vSphere, suffers no performance penlty, while benefiting from the tremenduous management capabilities offered by the VMware vSphere platform.
Experiment Configuration and Methodology
We used MNIST [4] to compare the performance of containers running natively with containers running inside a VM. The configuration of the VM and the vSphere server we used for the “virtualized container” is shown in Table 1. The configuration of the physical machine used to run the container natively is shown in Table 2.
vSphere | 6.0.0, build 3500742 |
Nvidia vGPU driver | 367.53 |
Guest OS | CentOS Linux release 7.4.1708 (Core) |
CUDA driver | 8.0 |
CUDA runtime | 7.5 |
Docker | 17.09-ce-rc2 |
⇑ Table 1. Configuration of VM used to run the nvidia-docker container
Nvidia driver | 384.98 |
Operating system | CentOS Linux release 7.4.1708 (Core) |
CUDA driver | 8.0 |
CUDA runtime | 7.5 |
Docker | 17.09-ce-rc2 |
⇑ Table 2. Configuration of physical machine used to run the nvidia-docker container
The server configuration we used is shown in Table 3 below. In our experiments, we used the NVIDIA M60 GPU in vGPU mode only. We did not use the Direct I/O mode. In the scenario in which we ran the container inside the VM, we first installed the NVIDIA vGPU drivers in vSphere and inside the VM, then we installed CUDA (driver 8.0 with runtime version 7.5), followed by Docker and nvidia-docker [7]. In the case where we ran the container natively, we installed the NVIDIA driver in CentOS running natively, followed by CUDA (driver 8.0 with runtime version 7.5), Docker and finally, nvidia-docker [7]. In both scenarios we ran MNIST and we measured the run time for training using a wall clock.
⇑ Figure 1. Testbed configuration for comparison of the performance of containers running natively vs. running in a VM
Model | Dell PowerEdge R730 |
Processor type | Intel® Xeon® CPU E5-2680 v3 @ 2.50GHz |
CPU cores | 24 CPUs, each @ 2.5GHz |
Processor sockets | 2 |
Cores per socket | 14 |
Logical processors | 48 |
Hyperthreading | Active |
Memory | 768GB |
Storage | Local SSD (1.5TB), Storage Arrays, Local Hard Disks |
GPUs | 2x M60 Tesla |
⇑ Table 3. Server configuration
Results
The measured wall-clock run times for MNIST are shown in Table 4 for the two scenarios we tested:
- Running in an nvidia-docker container in CentOS running natively.
- Running in an nvidia-docker container inside a CentOS VM on vSphere.
From the data, we can clearly see that there is no measurable performance penalty for running a container inside a VM as compred to running it natively.
Configuration | Run time for MNIST as measured by a wall clock |
Nvidia-docker container in CentOS running natively | 44 minutes 53 seconds |
Nvidia-docker container running in a CentOS VM on vSphere | 44 minutes 57 seconds |
⇑ Table 4. Comparison of the run-time for MNIST running in a container on native CentOS vs. in a container in virtualized CentOS
Takeaways
- Based on the results shown in Table 4, it is clear that there is no measurable performance impact due to running a containerized application in a virtual environment as opposed to running it natively. So, from a performance perspective, there is no penalty for using a virtualized environment.
- It is important to note that since containers do not include the GPU driver or the CUDA environment, both of these components need to be installed separately. It is in this aspect that a virtualized environment offers a superior user experience; an nvidia-docker container in CentOS running natively requires that any existing GPU and CUDA drivers be removed if the version of the drivers does not match that required by the container. Uninstalling and re-installing the correct drivers is often a challenging and time consuming task. However, in a virtualized environment, you can, in advance, create and store in a repository, a number of CentOS VMs with different VGPU and CUDA drivers. When you need to run an application in an nvidia-docker container, just clone the VM with the correct drivers, load the container, and run with no performance penalty. In such a scenario, running in a virtualized environment does not require you to uninstall and re-install the correct drivers, which saves both time and considerable frustration. This issue of uninstalling and re-installing drivers in a native environment becomes considerably more difficult if there are multiple container users on the system; in such a scenario, all the containers need to be migrated to use the new drivers, or the user who needs a new driver will have to wait until all the other users are done before a system administrator can upgrade the GPU drivers on the native CentOS.
Future Work
In this blog, we presented the performance results of running MNIST in a single container. We plan to run MNIST in multiple containers running concurrently in both a virtualized environment and on CentOS executing natively, and report the measured run times. This will provide a comparison of the performance as we scale up the number of containers.
References
- Google Cloud Platform: Cloud AI. https://cloud.google.com/products/machine-learning/
- Wikipedia: Deep Learning. https://en.wikipedia.org/wiki/Deep_learning
- NVIDIA GPUs – The Engine of Deep Learning. https://developer.nvidia.com/deep-learning
- The MNIST Database of Handwritten Digits. http://yann.lecun.com/exdb/mnist/
- TensorFlow: An Open-Source Software Library for Machine Intelligence. https://www.tensorflow.org
- Wikipedia: Operating-System-Level Virtualization. https://en.wikipedia.org/wiki/Operating-system-level_virtualization
- NVIDIA Docker: GPU Server Application Deployment Made Easy. https://devblogs.nvidia.com/parallelforall/nvidia-docker-gpu-server-application-deployment-made-easy/
- Episode 1: Performance Results of Machine Learning with DirectPath I/O and GRID vGPU. https://blogs.vmware.com/performance/2016/10/machine-learning-vsphere-nvidia-gpus.html
- Episode 2: Machine Learning on vSphere 6 with NVIDIA GPUs. https://blogs.vmware.com/performance/2017/03/machine-learning-vsphere-6-5-nvidia-gpus-episode-2.html
- Episode 3: Performance Comparison of Native GPU to Virtualized GPU and Scalability of Virtualized GPUs for Machine Learning. https://blogs.vmware.com/performance/2017/10/episode-3-performance-comparison-native-gpu-virtualized-gpu-scalability-virtualized-gpus-machine-learning.html