posted

10 Comments

vSGA, or Virtual Shared Graphics Acceleration, allows multiple VMware vSphere virtual machines to share hardware GPUs. We have advocated in previous blog articles the use of NVIDIA GRID vGPU technology, and this is a good solution for many use cases. In this blog, we look at the performance of vGPU technology vs. vSGA while limiting our testing to a workload generated by VMware Horizon 7 VDI desktops. Based on our measurements (we present some of that data in this blog) vSGA provides performance very close to vGPU when using a variety of software applications, including Microsoft Office, Adobe Acrobat, CAD viewers, YouTube video, and viewing or working with WebGL-based images.

All modern PC operating systems expect the device on which they are running to have a GPU to provide a reasonably acceptable level of user experience. VDI desktops, which run on servers in the cloud, need to support 3D graphics to provide a good user experience. One approach to support 3D graphics for VMware Horizon VDI desktops is to use the NVIDIA GRID vGPU solution. An alternate approach is to use the VMware vSGA stack, shown in Figure 1, to support 3D graphics for Horizon VDI desktops.

Figure 1: VMware’s vSGA and Nvidia GRID vGPU stacks. Both stacks support 3D graphics in vSphere using NVIDIA GPUs.

Based on our experiments, we recommend that VDI desktops use VMware vSGA to support 3D graphics for many applications, including (but not limited to):

  • Microsoft Office applications
  • Adobe Acrobat
  • YouTube/instructional/training video playback
  • CAD viewers
  • WebGL

We still recommend NVIDIA GRID vGPU on vSphere for compute-intensive applications such as 3D modeling/CAD design, high-performance computing (HPC), data science, and AI. See NVIDIA and VMware Enterprise GPU virtualization for details.

In this blog, we compare the quality of user experience in VDI with three different configurations:

  1. A CPU-only desktop
  2. VDI using vSGA
  3. VDI using GRID vGPU

Our results show:

  • vSGA offers a significant improvement in user experience over a CPU-only VDI desktop.
  • vSGA performance, measured in terms of user experience, is very close to that of GRID vGPU.

Testbed Configuration

Figure 2 and Table 1 show our testbed configuration for the experiments we conducted to compare VMware vSGA vs. NVIDIA GRID vGPU solutions.

Figure 2: Testbed configuration for VDI desktop.

Table 1: VDI Desktop VM Configuration.

Parameter Value/Configuration
VCPUS 2
Memory 8 GB
Disk 64 GB
OS Windows 10 Enterprise
Applications Installed Office 2013, Chrome Browser, Adobe Reader
VDI Protocol Blast
VRAM 96 MB
vSGA (3D Memory) 512 MB
vGPU Profile M60-1b
VMware Horizon Version 7.6
VDI desktop resolution 1600×1200

Description of Experiments

First, we ran three different tests: two of which used PowerPoint, and the third used a web page with content from YouTube. All tests were run using all three configurations for the VDI desktop: CPU only, vSGA, and vGPU.

In the first experiment, we ran all three configurations, and we recorded the contents of the screen (that is, took screenshots) of the VDI desktop and used the screenshots to demonstrate the improvement in user experience, including:

  • Improvement in frames per second (FPS)
  • Smoothness and focus obtained with a vSGA-enabled VDI desktop compared to a CPU-only VDI desktop

The screenshots were made on the VDI desktop, so the remoting protocol was not used in any way and did not have any impact on the quality of the user experience we recorded.

In a second experiment, we installed VMware Horizon in the VDI desktop. We created a second Windows 10 VM, as shown in Figure 3, on the same server and used this VM as a VMware Horizon 7 client. We connected to the VDI desktop from this Horizon client running in a VM and measured the FPS, smoothness, and amount of distortion in the image for the three tests described in the previous paragraph. The goal of this experiment was to quantify the substantial improvement in the user experience from using vSGA for VDI and to demonstrate that the user experience obtained using vSGA is very close to that obtained using vGPU.

Figure 3: Testbed setup for second set of experiments to quantify the user experience improvement from using vSGA.

Finally, in a third experiment, we compared the performance of a VDI desktop using WebGL benchmarks for all three configurations. The results are presented in the next section.

Results

Experiment 1

Figure 4 shows a side-by-side comparison of an animation running in a VDI desktop featuring a CPU only with that of a VDI desktop featuring vSGA technology.

Figure 4: (Click to run animation.) Comparison of a PowerPoint animation running in a CPU-only VDI desktop with that in a VDI desktop with vSGA. The completion of the animation is signaled by the appearance of a red rectangle in the bottom right corner. The CPU-only version displays fewer frames and is less smooth.

The CPU-only version finished displaying the animation faster because it displays fewer frames and is less smooth. The vSGA version displays almost all the frames in the animation, which is the expected behavior.

To quantify this FPS and smoothness advantage due to vSGA, we ran a second test, in which we connected to the VDI desktop from a client in a VM, and we recorded the frames as seen on the Horizon client.

Figure 5 shows a comparison of the user experience measured in terms of FPS (the blue  bars)  and smoothness (orange bars) when a remoting protocol is interposed between the observer and the VDI desktop. The purpose is to show that the improved user experience persists even when a remoting protocol is interposed between the observer and the VDI desktop.

Figure 5: Normalized FPS and smoothness for vSGA and CPU-only VDI desktop with vGPU as the reference. A value of 1.0 is the best. Lower values indicate reduced levels of user experience.

To measure FPS, we first converted the screenshots to grayscale images, applied the Laplace transform (which reduces the amount of data to process while maintaining the structural aspects), and then computed the SSIM for every pair of successive screenshots. (Based on our testing, this method of identifying distinct frames provides greater fidelity than simply computing the SSIM of screenshots.) If the SSIM value was less than a certain threshold, we tested the two images to determine if one was a blurred version of the other. If they were not, we counted them as distinct frames. The pair-wise SSIM values, for a sequence of screenshots, constituted a time series. We computed a smoothness metric from this time series. The normalized smoothness metric is shown in Figure 5, above.

Clearly, the CPU-only VDI desktop shows markedly lower smoothness for the animation. The smoothness and FPS obtained using the vSGA stack is very close to that obtained using the vGPU stack, which demonstrates that the user experience with vSGA is close to that with vGPU, and it is significantly better than a CPU-only solution.

For a second comparison of user experience, we ran a simple video embedded in a PowerPoint slide. We recorded the screenshots as seen on the VDI desktop with the PowerPoint slide as shown, using a CPU-only solution and a vSGA graphics stack. A side-by-side comparison is shown in Figure 6.

Figure 6: (Click to run animation.) Side-by-side comparison of the quality of an embedded video in PowerPoint when played using a CPU-only VDI desktop, and a VDI desktop with vSGA. The CPU-only playback shows a significant number of artifacts and a much reduced frame rate.

Experiment 2

In a second experiment, we recorded the playback of this video embedded in a PowerPoint slide from a Horizon client. We analyzed the screenshots to compute the FPS for this embedded video playback on the three configurations: CPU-only, vSGA, and vGPU. We also computed a measure of the number of pixels impacted by artifacts for the CPU-only, vSGA, and vGPU configurations. We normalized this artifact measure using the vGPU configuration as the baseline. This data is shown in Figure 7.

Figure 7: Comparison of the normalized FPS and normalized measure of number of pixels impacted by artifacts for embedded video playback using CPU-only, vSGA, and vGPU configurations. From the data on the normalized measure of artifacts, we can see that the CPU-only configuration has many more artifacts than the vSGA configuration. The data also shows that the FPS in the CPU-only configuration is about one-third that in the vSGA configuration.

For a third comparison of user experience, we captured screenshots, shown in Figures 8 and 9, on the VDI desktop while playing a YouTube video. Figure 8 shows a side-by-side comparison of two screenshots: one from a CPU-only configuration, and the other from a vSGA configuration. The figure shows the CPU-only version is so badly blurred that the letters are illegible. The vSGA configuration shows no such artifacts.

Figure 8: Side-by-side comparison of screenshots taken while playing a YouTube video in a VDI desktop with two different configurations: CPU only on the left and a vSGA configuration on the right. These screenshots were taken on the VDI desktop with no remoting protocol involved.

Figure 9: Side-by-side comparison of screenshots taken while playing a YouTube video in a VDI desktop with vSGA and the vGPU stacks. There is no noticeable difference between the images with these two stacks.

Experiment 3

In a third set of experiments, we ran some WebGL benchmarks using all three configurations: CPU-only, vSGA, and vGPU. The data obtained by running these benchmarks is shown in Table 2.

Table 2: Comparison of WebGL benchmark performance for all three configurations.

Test / Benchmark vSGA CPU-only vGPU (M60-1b)
WebGL Aquarium 40 fps 4 fps  60 fps
WebGL Unity3D 42,371 23,020 56,307
WebGL Bmark 1174 720 2079

From the benchmark results, we can see that vSGA performance is close to vGPU performance and is significantly higher than the performance of the CPU-only configuration. In the case of the WebGL benchmark Aquarium, the performance using vSGA is much larger than the CPU-only configuration.

Key Takeaways

  • VDI desktops that run today’s operating systems need GPU support to deliver a reasonably acceptable level of user experience. The VMware vSGA stack offers a user experience that is superior to that with a CPU-only desktop, and close to that available using NVIDIA GRID vGPU.
  • For a typical VDI environment in which good graphics performance is desirable, including slideshow animation and video stream playback, we recommend enabling vSGA with hardware acceleration.
  • Both the vSGA and vGPU stacks support vMotion. However, vGPU only allows vMotion of a VM if the source and destination server have identical GPUs and matching drivers. vSGA has no such limitation; it supports vMotion between different generations of cards or different host driver versions.
  • We have implemented a mechanism to measure the FPS, smoothness, and level of distortion in the contents displayed on the desktop. The mechanism is independent of the applications that are run in the desktop, requires no access to the VDI desktop, and does not require prior knowledge of the applications that are being monitored. We just compare the measurements made on the VDI client with reference measurements made, say, by running the applications on a local laptop, to generate a quantitative measure of the user experience at the VDI client compared to that on the local laptop. This comparison gives a quantitative measure of the user experience for VDI when compared to using a local laptop.

Future Work

In the future, we plan to:

  • Compare the performance of the three stacks using video playback of instructional/training videos.
  • Design and implement a metric to measure how much “out of focus” a rendered image is. We plan to use such a metric to quantify the improvement in performance of vSGA compared to a CPU-only desktop for screenshots like those in Figures 8 and 9.