For desktop VMs using VMXnet3 NICs, you can significantly improve the peak video playback performance of your View desktop by simply setting the following registry setting to the value recommended by Microsoft:
HKLM\System\CurrentControlSet\Services\Afd\Parameters\FastSendDatagramThreshold to 1500
[As discussed in a Microsoft KB article here]
[N.B. A reboot of the desktop VM is required after changing this registry setting]
When running full-screen videos at 1080p resolution on a 2vCPU desktop, we see this deliver frame-rate improvements of up to 1.4X.
So, what does this do and why does it deliver these benefits?
The VMXNET3 adapter is a paravirtualized NIC designed for performance that, as of vSphere 5, supports interrupt coalescing. Virtual interrupt coalescing is similar to a physical NICs interrupt moderation and is useful in improving CPU efficiency for high throughput workloads. Unfortunately, out-of-the-box, Windows does not benefit from interrupt coalescing in many scenarios (those sending packets larger than 1024-bytes), because after sending a packet, Windows waits for a completion interrupt to be delivered before sending the next packet. By setting ParametersFastSendDatagramThreshold to the Microsoft recommended value of 1500 bytes you instruct Windows not to wait for the completion interrupt even when sending larger packets. Accordingly, you are allowing View and PCoIP (as well as other applications that send larger packets) to benefit from interrupt coalescing – reducing CPU load and improving network throughput for PCoIP -- which translates into significantly improved video playback performance.
Looking around the various VDI forums, there's usually some discussion of VDI clients that don't require any additional plug-ins, add-ons or installation of any kind on the end user device – a zero-install solution. While this discussion is usually focused on access via web browsers, I've recently been playing with ThinApp to deliver a zero-install VMware View client.
At the risk of this sounding overly like a sales pitch, I was pretty impressed by the elegance of this solution; I was able to easily pull my thinApped View client from the web, network share and USB drive and launch a Remote Desktop using PCoIP on pretty much any Windows device with a simple double click. Maybe not as quick as double clicking on Firefox, but pretty close.
Are there any downsides? I found application launch time to be a little longer than achieved with a traditionally installed client. Also, USB redirection and virtual printing aren't supported, but then these problems plague web clients too. The main benefit I see is that I get away from the install/update hassles, but keep the performance, bandwidth and user experience benefits associated with PCoIP.
As a final note, I would add that I found the process associated with creating a ThinApped version of the View Client to be extremely straightforward and it took me less than 15 minutes.
In addition to the numerous enhancements detailed here, View 5.1 debuts a number of significant enhancements and optimizations to the PCoIP protocol. In this blog, we detail some of the most beneficial:
Continuing refinements to compression protocols and general performance optimization deliver further improvements in PCoIP efficiency and corresponding reductions in CPU consumption. While already performing better than the VDI competition (as illustrated here and here), these enhancements deliver up to an additional 1.3X reduction in PCoIP overheads.
For this release, there has been significant optimization of the VMware View clients, making their protocol handling significantly more streamlined. This is especially apparent on thin-clients, where video playback performance is improved by as much as 3X over previous versions, as illustrated in the figure below. Indeed, even relatively low-performance processors can deliver excellent 720p video playback performance. These improvements are available for both x86 and ARM clients.
PCoIP handling of adverse network conditions has been significantly improved. This is especially beneficial for users connecting wirelessly from tablets or laptops over congested and lossy WiFi networks. These enhancements are most apparent during video playback and ensure fluid high-frame video playback -- the improvement can be as high as 8X.
Significant improvements have also been made to interactivity, making interaction with the remote desktop significantly more fluid, and continuing to further improve the experience associated with using a remote desktop. As a simple visual test of this improvement, the picture below show a user rapidly drawing a spiral in mspaint, when connecting to their remote desktop using both RDP7 and View 5.1. With RDP7, the resulting spiral is obviously formed from rough polygons, whereas, with View 5.1, the spiral is significantly smoother (while this test may seem an overly simple example, it is heavily influenced by the speed and frequency at which the client communicates with the remote desktop and clearly conveys the likely differences in scrolling and dragging performance – in a later blog we will deep-dive on interactive performance, using the user experience techniques we discuss here).
In last week’s post, we discussed 4 simple settings that we have observed deliver significant resource savings, while preserving user experience for typical desktop users. While we discussed the benefits of each setting in isolation, I just wanted to illustrate the overall gains. For runs using View Planner (which simulates a typical office user, with MS Office apps, browsers, Adobe reader, video playback, photo albums etc – more details can be found here), we observe a significant reduction in bandwidth when these 4 resource control settings are applied in unison:
From the above plot it is apparent that the bandwidth reductions resulting from i) disabling build-to-loss, ii) setting the maximum frame rate to 15, iii) setting maximum audio bandwidth to 100, and iv ) performing simple in-guest operations (such as selecting “optimize for visual performance” and disabling ClearType) are mainly additive, and the cumulative benefit is pretty substantial – around a 1.8X reduction from the default! [Particularly compelling, given that for typical office users there is very little difference in user experience]
By default the VMware View PCoIP protocol dynamically optimizes for the best user experience for the given resource constraints. In the majority of environments, this is the desired approach. However, there can be times where individual users or group administrators are interested in different resource utilization policies and in past blogs and whitepapers we have discussed in detail how to configure PCoIP to optimize for constrained resource consumption. In this post, I just wanted to provide a concise summary of these recommendations by highlighting 4 simple optimizations that our extensive internal testing has shown yield significant benefits:
- Disable build-to-lossless: setting enable_build_to_lossless to 0 delivers about a 1.3X reduction in bandwidth for typical office workloads. And, PCoIP still builds to a high quality lossy image that is virtually indistinguishable from fully lossless for office workloads.
- Optimize video frame-rate: setting maximum_frame_rate to 15 reduces video bandwidth by almost 1.7X in many situations, yet continues to deliver a smooth motion experience.
- Optimize audio bandwidth: setting audio_bandwidth_limit to 100 reduces audio bandwidth by around 5X, while continuing to deliver good quality sound.
- In-guest optimization: setting Windows visual settings to "optimize for performance" reduces bandwidth by over 1.1X for typical office workloads. Additionally, disabling ClearType reduces bandwidth by a further 1.05X. Disabling desktop wallpaper, and setting the screen saver to none, can also deliver bandwidth savings, although the new client image caching support in View 5 often significantly reduces the additional bandwidth traditionally associated with these options. Finally, disabling Windows update, Super-fetch and Windows index significantly reduces redo-log growth, minimizing storage requirements. Full details of in-guest optimizations can be found here.
[N.B. the PCoIP settings can be set via the Windows registry, or via GPO.]
These simple changes significantly decrease bandwidth consumption, increase consolidation ratios, have minimal impact on typical user experiences and represent good defaults in many environments.
PCoIP is an adaptive protocol that works to deliver the best possible user experience for any given network and CPU constraints. In the majority of environments, this is the desired approach. However, there can be times where individual users or group administrators are interested in different resource utilization policies. For instance, administrators may not want users consuming too much corporate LAN bandwidth streaming youtube videos!
The View PCoIP protocol provides a number of options that can be used to impose these constraints on audio and video streaming operations, while only having a minimal impact on quality:
Setting the maximum frame rate to 15 and the maximum initial image quality to 70 or 80, can reduce the bandwidth associated with video playback by 2 to 4X in the LAN environment. Even with the maximum initial image quality reduced to 70, image quality is good, even for high quality mp4 videos.
Setting the session audio bandwidth limit to 100 can reduce the audio bandwidth by around 5X. Even with this change, audio quality is good.
More details on how to apply these settings (and additional global resource constraint settings) can be found in the View 5 network optimization whitepaper (located here).
We have talked in previous posts about the ability in View 5 to disable build to lossless (BTL). When BTL is disabled, PCoIP rapidly builds the client image to a high quality, but lossy image -- by default, if the image remains constant, PCoIP would continue to refine the image in the background until it reaches a fully lossless state. Stopping the build process when the image reaches the "perceptually lossless" stage can deliver significant bandwidth savings -- for typical office workflows, we are seeing around a 30% bandwidth reduction.
Furthermore, in many situations, the difference between fully lossless and perceptually lossless images can be virtually impossible to discern. During our VMworld presentation, we used the following image to emphasize the quality of perceptually lossless:
In this qualitative comparison, we present a zoom-in of two small images. For both images, View fully lossless and View perceptually lossless (no BTL) images are shown side-by-side for comparison -- hopefully conveying how difficult it is, even when zoomed, to find differences.
To further emphasize the perceptually lossless quality, it’s also interesting to examine quantitative data -- for example, PSNR (peak signal to noise ratio) and RMS (root-mean-square) error data. For a fairly complex image -- a fall-colors landscape with significant fine detail in the background tree colors -- comparing the perceptually lossless build to a fully lossless build (RGB space), yields a PSNR value of 45.8dB, and RMS error value of 1.3! This clearly illustrates how little loss in quality is associated with perceptually lossless images. For instance, consider the RMS error of 1.3: for 32-bit colors, each rgba component has 8-bits of precision, with values ranging from 0 to 255. For this image, perceptually lossless is introducing an average error of +/-1.3 to these values -- fairly negligible for most use cases!!
[While PSNR ratio obviously varies from image to image, I'm seeing ~45dB much of the time]
At the recent VMworld we mentioned that VMware View 5 introduces PCoIP support for client-side image caching. In our VMworld presentation, we highlighted that, on average, this caching optimization reduces bandwidth consumption by about 30%. However, there are a number of important scenarios where the ability of the PCoIP image cache to capture spatial, as well as temporal, redundancy delivers even bigger benefits.
For instance, consider scrolling through a PDF document. As we scroll down, new content appears along the bottom edge of the window, and the oldest content disappears from the top edge. All the other content in the application window remains essentially constant, merely shifted upward. The PCoIP image cache is capable of detecting this spatial and temporal redundancy. As a result, for scrolling operations, the display information sent to the client device is primarily just a sequence of cache indices -- delivering significant bandwidth savings.
This efficient scrolling has a couple of key benefits;
- On LAN networks, where bandwidth is relatively unconstrained, there’s sufficient bandwidth available for high quality scrolling even when client-side caching is disabled. In these situations, enabling client-side image caching delivers significant bandwidth savings – experimenting with a variety of different applications and content types (text heavy, image heavy etc), I'm seeing bandwidth reductions of over 4X (compared with caching disabled. mileage may vary, but I’m seeing this fairly consistently)!
- On WAN networks, where bandwidth is fairly scarce, when client-side caching is disabled, scrolling performance is often degraded to stay within the available bandwidth. In these situations, in addition to bandwidth reductions (which vary based on the degree to which scrolling performance is degraded when client-side caching is disabled), client-side caching also ensures smooth, highly responsive scrolling operations even in WAN environments with very constrained bandwidth.
Slides from our recent VMworld presentation on View 5 PCoIP performance (EUC1987) are now available here. In addition to discusing the latest PCoIP optimizations and best practices in detail, it also presents competitive data.
In recent weeks there’s been growing excitement about the PCoIP enhancements coming to VMware View. For instance, Warren Ponder discussed here how these enhancements reduce bandwidth consumption by up to 75%. Engineers from VMware’s performance team (& Warren) will be talking more about these enhancements and how they translate into real-world performance at the rapidly approaching VMworld 2011 in Las Vegas:
EUC1987: VMware View PC-over-IP Performance and Best Practices
Tuesday, August 30th 12:00
Wednesday, August 31st 1:00
EUC3163: VMware View Performance and Best Practices
Tuesday, August 30th - 4:30
Wednesday, August 31st - 4:00
We will also be blogging additional details and performance results as VMworld progresses, followed by a performance whitepaper.