VMware vSGA for Content-Rich VDI

vSGA, or Virtual Shared Graphics Acceleration, allows multiple VMware vSphere virtual machines to share hardware GPUs. We have advocated in previous blog articles the use of NVIDIA GRID vGPU technology, and this is a good solution for many use cases. In this blog, we look at the performance of vGPU technology vs. vSGA while limiting our testing to a workload generated by VMware Horizon 7 VDI desktops. Based on our measurements (we present some of that data in this blog) vSGA provides performance very close to vGPU when using a variety of software applications, including Microsoft Office, Adobe Acrobat, CAD viewers, YouTube video, and viewing or working with WebGL-based images.

All modern PC operating systems expect the device on which they are running to have a GPU to provide a reasonably acceptable level of user experience. VDI desktops, which run on servers in the cloud, need to support 3D graphics to provide a good user experience. One approach to support 3D graphics for VMware Horizon VDI desktops is to use the NVIDIA GRID vGPU solution. An alternate approach is to use the VMware vSGA stack, shown in Figure 1, to support 3D graphics for Horizon VDI desktops.

Figure 1: VMware’s vSGA and Nvidia GRID vGPU stacks. Both stacks support 3D graphics in vSphere using NVIDIA GPUs.

Based on our experiments, we recommend that VDI desktops use VMware vSGA to support 3D graphics for many applications, including (but not limited to):

  • Microsoft Office applications
  • Adobe Acrobat
  • YouTube/instructional/training video playback
  • CAD viewers
  • WebGL

We still recommend NVIDIA GRID vGPU on vSphere for compute-intensive applications such as 3D modeling/CAD design, high-performance computing (HPC), data science, and AI. See NVIDIA and VMware Enterprise GPU virtualization for details.

In this blog, we compare the quality of user experience in VDI with three different configurations:

  1. A CPU-only desktop
  2. VDI using vSGA
  3. VDI using GRID vGPU

Our results show:

  • vSGA offers a significant improvement in user experience over a CPU-only VDI desktop.
  • vSGA performance, measured in terms of user experience, is very close to that of GRID vGPU.

Testbed Configuration

Figure 2 and Table 1 show our testbed configuration for the experiments we conducted to compare VMware vSGA vs. NVIDIA GRID vGPU solutions.

Figure 2: Testbed configuration for VDI desktop.

Table 1: VDI Desktop VM Configuration.

Parameter Value/Configuration
Memory 8 GB
Disk 64 GB
OS Windows 10 Enterprise
Applications Installed Office 2013, Chrome Browser, Adobe Reader
VDI Protocol Blast
vSGA (3D Memory) 512 MB
vGPU Profile M60-1b
VMware Horizon Version 7.6
VDI desktop resolution 1600×1200

Description of Experiments

First, we ran three different tests: two of which used PowerPoint, and the third used a web page with content from YouTube. All tests were run using all three configurations for the VDI desktop: CPU only, vSGA, and vGPU.

In the first experiment, we ran all three configurations, and we recorded the contents of the screen (that is, took screenshots) of the VDI desktop and used the screenshots to demonstrate the improvement in user experience, including:

  • Improvement in frames per second (FPS)
  • Smoothness and focus obtained with a vSGA-enabled VDI desktop compared to a CPU-only VDI desktop

The screenshots were made on the VDI desktop, so the remoting protocol was not used in any way and did not have any impact on the quality of the user experience we recorded.

In a second experiment, we installed VMware Horizon in the VDI desktop. We created a second Windows 10 VM, as shown in Figure 3, on the same server and used this VM as a VMware Horizon 7 client. We connected to the VDI desktop from this Horizon client running in a VM and measured the FPS, smoothness, and amount of distortion in the image for the three tests described in the previous paragraph. The goal of this experiment was to quantify the substantial improvement in the user experience from using vSGA for VDI and to demonstrate that the user experience obtained using vSGA is very close to that obtained using vGPU.

Figure 3: Testbed setup for second set of experiments to quantify the user experience improvement from using vSGA.

Finally, in a third experiment, we compared the performance of a VDI desktop using WebGL benchmarks for all three configurations. The results are presented in the next section.


Experiment 1

Figure 4 shows a side-by-side comparison of an animation running in a VDI desktop featuring a CPU only with that of a VDI desktop featuring vSGA technology.

Figure 4: (Click to run animation.) Comparison of a PowerPoint animation running in a CPU-only VDI desktop with that in a VDI desktop with vSGA. The completion of the animation is signaled by the appearance of a red rectangle in the bottom right corner. The CPU-only version displays fewer frames and is less smooth.

The CPU-only version finished displaying the animation faster because it displays fewer frames and is less smooth. The vSGA version displays almost all the frames in the animation, which is the expected behavior.

To quantify this FPS and smoothness advantage due to vSGA, we ran a second test, in which we connected to the VDI desktop from a client in a VM, and we recorded the frames as seen on the Horizon client.

Figure 5 shows a comparison of the user experience measured in terms of FPS (the blue  bars)  and smoothness (orange bars) when a remoting protocol is interposed between the observer and the VDI desktop. The purpose is to show that the improved user experience persists even when a remoting protocol is interposed between the observer and the VDI desktop.

Figure 5: Normalized FPS and smoothness for vSGA and CPU-only VDI desktop with vGPU as the reference. A value of 1.0 is the best. Lower values indicate reduced levels of user experience.

To measure FPS, we first converted the screenshots to grayscale images, applied the Laplace transform (which reduces the amount of data to process while maintaining the structural aspects), and then computed the SSIM for every pair of successive screenshots. (Based on our testing, this method of identifying distinct frames provides greater fidelity than simply computing the SSIM of screenshots.) If the SSIM value was less than a certain threshold, we tested the two images to determine if one was a blurred version of the other. If they were not, we counted them as distinct frames. The pair-wise SSIM values, for a sequence of screenshots, constituted a time series. We computed a smoothness metric from this time series. The normalized smoothness metric is shown in Figure 5, above.

Clearly, the CPU-only VDI desktop shows markedly lower smoothness for the animation. The smoothness and FPS obtained using the vSGA stack is very close to that obtained using the vGPU stack, which demonstrates that the user experience with vSGA is close to that with vGPU, and it is significantly better than a CPU-only solution.

For a second comparison of user experience, we ran a simple video embedded in a PowerPoint slide. We recorded the screenshots as seen on the VDI desktop with the PowerPoint slide as shown, using a CPU-only solution and a vSGA graphics stack. A side-by-side comparison is shown in Figure 6.

Figure 6: (Click to run animation.) Side-by-side comparison of the quality of an embedded video in PowerPoint when played using a CPU-only VDI desktop, and a VDI desktop with vSGA. The CPU-only playback shows a significant number of artifacts and a much reduced frame rate.

Experiment 2

In a second experiment, we recorded the playback of this video embedded in a PowerPoint slide from a Horizon client. We analyzed the screenshots to compute the FPS for this embedded video playback on the three configurations: CPU-only, vSGA, and vGPU. We also computed a measure of the number of pixels impacted by artifacts for the CPU-only, vSGA, and vGPU configurations. We normalized this artifact measure using the vGPU configuration as the baseline. This data is shown in Figure 7.

Figure 7: Comparison of the normalized FPS and normalized measure of number of pixels impacted by artifacts for embedded video playback using CPU-only, vSGA, and vGPU configurations. From the data on the normalized measure of artifacts, we can see that the CPU-only configuration has many more artifacts than the vSGA configuration. The data also shows that the FPS in the CPU-only configuration is about one-third that in the vSGA configuration.

For a third comparison of user experience, we captured screenshots, shown in Figures 8 and 9, on the VDI desktop while playing a YouTube video. Figure 8 shows a side-by-side comparison of two screenshots: one from a CPU-only configuration, and the other from a vSGA configuration. The figure shows the CPU-only version is so badly blurred that the letters are illegible. The vSGA configuration shows no such artifacts.

Figure 8: Side-by-side comparison of screenshots taken while playing a YouTube video in a VDI desktop with two different configurations: CPU only on the left and a vSGA configuration on the right. These screenshots were taken on the VDI desktop with no remoting protocol involved.
Figure 9: Side-by-side comparison of screenshots taken while playing a YouTube video in a VDI desktop with vSGA and the vGPU stacks. There is no noticeable difference between the images with these two stacks.

Experiment 3

In a third set of experiments, we ran some WebGL benchmarks using all three configurations: CPU-only, vSGA, and vGPU. The data obtained by running these benchmarks is shown in Table 2.

Table 2: Comparison of WebGL benchmark performance for all three configurations.

Test / Benchmark vSGA CPU-only vGPU (M60-1b)
WebGL Aquarium 40 fps 4 fps  60 fps
WebGL Unity3D 42,371 23,020 56,307
WebGL Bmark 1174 720 2079

From the benchmark results, we can see that vSGA performance is close to vGPU performance and is significantly higher than the performance of the CPU-only configuration. In the case of the WebGL benchmark Aquarium, the performance using vSGA is much larger than the CPU-only configuration.

Key Takeaways

  • VDI desktops that run today’s operating systems need GPU support to deliver a reasonably acceptable level of user experience. The VMware vSGA stack offers a user experience that is superior to that with a CPU-only desktop, and close to that available using NVIDIA GRID vGPU.
  • For a typical VDI environment in which good graphics performance is desirable, including slideshow animation and video stream playback, we recommend enabling vSGA with hardware acceleration.
  • Both the vSGA and vGPU stacks support vMotion. However, vGPU only allows vMotion of a VM if the source and destination server have identical GPUs and matching drivers. vSGA has no such limitation; it supports vMotion between different generations of cards or different host driver versions.
  • We have implemented a mechanism to measure the FPS, smoothness, and level of distortion in the contents displayed on the desktop. The mechanism is independent of the applications that are run in the desktop, requires no access to the VDI desktop, and does not require prior knowledge of the applications that are being monitored. We just compare the measurements made on the VDI client with reference measurements made, say, by running the applications on a local laptop, to generate a quantitative measure of the user experience at the VDI client compared to that on the local laptop. This comparison gives a quantitative measure of the user experience for VDI when compared to using a local laptop.

Future Work

In the future, we plan to:

  • Compare the performance of the three stacks using video playback of instructional/training videos.
  • Design and implement a metric to measure how much “out of focus” a rendered image is. We plan to use such a metric to quantify the improvement in performance of vSGA compared to a CPU-only desktop for screenshots like those in Figures 8 and 9.


11 comments have been added so far

  1. Two questions:
    1) I noticed the new (well out for a year now) Nvidia Tesla T4 cards are not on the compatibility list for vsga (

    Any ETA on certifying those for vsga? Otherwise the certified card selection is getting rather old for a new build.

    2) “vSGA does not require any additional licensing” This is a bit misleading as it seems the you still pay the NVIDIA tax as it requires Grid Virtual PC licenses per this page:

    Or is that incorrect?

    1. Hi, I’ve removed the info about licensing and I’m investigating. Thanks for bringing this up. I’m not sure about the Tesla T4 card support–investigating this as well.

    2. Hello Julie, do you have updates for the questions asked here ?
      We are also interested if vSGA is still an option. For our 4000 users workload it is sufficient.

    3. vSGA does not require additional licensing. for software rendering with CPU, this is a functionality that comes enabled by default. However, if you want to achieve VM consolidation while using vGPU, the vGPU license will come into play. If you dedicate vGPU per VM, you will get more numbers of VM packed on the same GPU by using vSGA.

      1. is vSGA also available disregarding vSphere Edition?
        so is it also available on vSphere Standard ?

      2. Well the problem is that you have to buy a license to get access to the Nvidia Enterprise page to download the vGPU driver. Because as they want to be sure to get you money either way they have disabled acceleration in Windows with the public available driver. The T4 is not supported for vSGA, but it works with the new vGPU driver for Vpshere 7. I’m testing it right now. I think it’s really bad that Nvidia is charging you when you are not using their technology, but that’s have that company is. It’s also reflecting bad on Vmware.

  2. Any Update on this?

    This article makes a pretty compelling case for vSGA . From what I understand vSGA will still do a better job of allocating GPU resources to some users and falling back to CPU when needed, whereas a vGPU solution would require a dedicated cluster providing vGPU to all users since the integrated driver can’t fallback cleanly in the event of over-commitment or licensing server failure.

    The Nvidia folks seem to feel that vSGA is dead and something we shouldn’t be deploying. Over in the Nvidia forums a rep said ” it is a dying technology we don’t have focus on. Didn’t have a single customer in the last 2 years using vSGA.”

    So while vSGA sounds like a good fit for a non-engineering VDI application, the lack of hardware available to run it makes it a tough choice right now. The options seem strangely limited right now: If you’re on 6.7 U3, nothing from AMD, and if you go the vSGA route, missing the current Nvidia Card (T4). Looking through Nvidia’s website their VDI focus seems largely on engineering applications instead of accelerating basic office VDI.

    So while it’s great to see interest and research from VMware on the issue, I’m hoping to see some compelling hardware solutions appear for this.

    1. Agreed, same concerns as you. This article seems to be in direct opposition with what nvidia is telling me. Yes, they want to sell more cards so they are motivated for people not to use vSGA. But its more a question of compatibility. They dont have a card released in the last few years that supports it. it may work, but not with official support.

      So I’m definitely curious about what the message is here? Is vSGA something that vmware is actively still developing?

  3. Anyone heard any more on this? I’ve had a couple projects on hold trying to figure out the best way forward. As the article above points out, for knowledge workers, vSGA is a very compelling solution.

    @Nisha : your statement
    “vSGA does not require additional licensing. for software rendering with CPU” is not accurate. See this page: where it says “VMware Horizon vSGA: Need this license: GRID vPC – for PC level applications”.

    However, even with the cost of the NVidia licensing this still seems like a powerful solution. However, nVidia doesn’t support it on current hardware (T4), and they even have a footnote in their release notes specifically excluding it from any but older GPUs .

    So I’m still stuck… I’d feel foolish buying a very old card and having it go obsolete in a year, but I also don’t think full vGPU makes sense for us: Limited additional performance benefits (See above article), and significant administrative limitations – needing entire cluster configured with it, no true vmotion, etc.

    So I suppose we’re left with just not doing any GPU for knowledge worker VDI, which is a shame because we’ve got the budget allocated for hardware and software licenses, but can’t find anything to buy, and ultimately it’s going to diminish the VDI experience for end users.

Leave a Reply

Your email address will not be published.