Performance Enhancements in View 5.2

View 5.2 became generally available today, and we wanted to take this opportunity to present a high-level overview of some of the performance enhancements that debut with View 5.2 and PCoIP. In this release, PCoIP’s image cache has been significantly improved to allow users on memory constrained devices to run with much smaller cache sizes; firstly, support was introduced to efficiently handle situations where image content is shifted vertically, as occurs during scroll operations. Secondly, View 5.2 debuts improved cache compression algorithms that provide significant additional compression of the View client’s image cache. Finally, the cache’s handling of progressive build operations has been made significantly more efficient. All of these enhancements combine to allow users to derive significant bandwidth reductions using considerably smaller cache sizes than was achievable with View 5.1:

The above figure illustrates that, for typical office workflows, running View 5.2 with up to a 5X smaller cache can still deliver significant bandwidth savings; a 90MB View 5.2 cache was found to deliver comparable performance to View 5.1 configured with a 250MB cache, and even a 50MB View 5.2 cache delivered the majority of the bandwidth reduction benefits observed from View 5.1 configured with a 250MB cache. This up to 5X reduction in cache size can be a compelling option for memory constrained thin clients or tablet devices. The maximum image cache size can be configured via GPOs or set on the client device.

Alternatively, users can continue to leverage the default 250MB cache size in View 5.2 and will see reduced bandwidth utilization in comparison with View 5.1:

The above figure illustrates the average bandwidth utilization observed for View 5.2 during a VMware View Planner run in two different WAN environments for out-of-the-box PCoIP configurations. The results are normalized to the View 5.1 baseline, and illustrate that in the 2 Mb/s environment, the average session bandwidth is reduced by around 6%. Moreover, in the “extreme WAN” environment, View 5.2 delivers almost 10% reduction in bandwidth utilization, compared with View 5.1. These reductions can be compelling when consolidating View sessions from a branch office onto a limited capacity link, or when users are connecting over congested WiFi connections. Furthermore, as would be expected, reducing the number of image blocks being encoded, not only reduces the bandwidth utilization, but also has the benefit of improving interactivity (faster transmission of updates and the opportunity for higher frame rates, given the reduced bandwidth utilization) and reducing CPU consumption (less encoding work being done).

Finally, other PCoIP enhancements that debut with View 5.2 include:

1. GPO settings take immediate effect: many of the performance orientated GPO settings now take effect immediately, allowing users or administrators to closely customize the behavior of their PCoIP sessions.

2. Relative mouse support: previously, support was only provided for absolute mode. However, for certain 3D applications relative mouse is required and support is introduced on View 5.2.

We will cover all of these optimizations in greater detail in an upcoming View 5.2 Performance and Best Practices Whitepaper.

Turbo-charge View Video Performance

For desktop VMs using VMXnet3 NICs, you can significantly improve the peak video playback performance of your View desktop by simply setting the following registry setting to the value recommended by Microsoft:

HKLM\System\CurrentControlSet\Services\Afd\Parameters\FastSendDatagramThreshold to 1500

[As discussed in a Microsoft KB article here]

[N.B. A reboot of the desktop VM is required after changing this registry setting]

When running full-screen videos at 1080p resolution on a 2vCPU desktop, we see this deliver frame-rate improvements of up to 1.4X.

So, what does this do and why does it deliver these benefits?

The VMXNET3 adapter is a paravirtualized NIC designed for performance that, as of vSphere 5, supports interrupt coalescing. Virtual interrupt coalescing is similar to a physical NICs interrupt moderation and is useful in improving CPU efficiency for high throughput workloads. Unfortunately, out-of-the-box, Windows does not benefit from interrupt coalescing in many scenarios (those sending packets larger than 1024-bytes), because after sending a packet, Windows waits for a completion interrupt to be delivered before sending the next packet. By setting ParametersFastSendDatagramThreshold to the Microsoft recommended value of 1500 bytes you instruct Windows not to wait for the completion interrupt even when sending larger packets. Accordingly, you are allowing View and PCoIP (as well as other applications that send larger packets) to benefit from interrupt coalescing – reducing CPU load and improving network throughput for PCoIP  — which translates into significantly improved video playback performance.

ThinApped VMware View Clients – a Zero-install solution

Looking around the various VDI forums, there's usually some discussion of VDI clients that don't require any additional plug-ins, add-ons or installation of any kind on the end user device – a zero-install solution. While this discussion is usually focused on access via web browsers, I've recently been playing with ThinApp to deliver a zero-install VMware View client.

At the risk of this sounding overly like a sales pitch, I was pretty impressed by the elegance of this solution; I was able to easily pull my thinApped View client from the web, network share and USB drive and launch a Remote Desktop using PCoIP on pretty much any Windows device with a simple double click. Maybe not as quick as double clicking on Firefox, but pretty close.

Are there any downsides? I found application launch time to be a little longer than achieved with a traditionally installed client. Also, USB redirection and virtual printing aren't supported, but then these problems plague web clients too. The main benefit I see is that I get away from the install/update hassles, but keep the performance, bandwidth and user experience benefits associated with PCoIP.

As a final note, I would add that I found the process associated with creating a ThinApped version of the View Client to be extremely straightforward and it took me less than 15 minutes.

View 5.1 Performance

In addition to the numerous enhancements detailed here, View 5.1 debuts a number of significant enhancements and optimizations to the PCoIP protocol. In this blog, we detail some of the most beneficial: 

PCoIP efficiency

Continuing refinements to compression protocols and general performance optimization deliver further improvements in PCoIP efficiency and corresponding reductions in CPU consumption. While already performing better than the VDI competition (as illustrated here and here), these enhancements deliver up to an additional 1.3X reduction in PCoIP overheads.

Client optimizations

For this release, there has been significant optimization of the VMware View clients, making their protocol handling significantly more streamlined. This is especially apparent on thin-clients, where video playback performance is improved by as much as 3X over previous versions, as illustrated in the figure below. Indeed, even relatively low-performance processors can deliver excellent 720p video playback performance. These improvements are available for both x86 and ARM clients.

View5.1-perf

Network improvements

PCoIP handling of adverse network conditions has been significantly improved. This is especially beneficial for users connecting wirelessly from tablets or laptops over congested and lossy WiFi networks. These enhancements are most apparent during video playback and ensure fluid high-frame video playback — the improvement can be as high as 8X.

Interactivity improvements

Significant improvements have also been made to interactivity, making interaction with the remote desktop significantly more fluid, and continuing to further improve the experience associated with using a remote desktop. As a simple visual test of this improvement, the picture below show a user rapidly drawing a spiral in mspaint, when connecting to their remote desktop using both RDP7 and View 5.1. With RDP7, the resulting spiral is obviously formed from rough polygons, whereas, with View 5.1, the spiral is significantly smoother (while this test may seem an overly simple example, it is heavily influenced by the speed and frequency at which the client communicates with the remote desktop and clearly conveys the likely differences in scrolling and dragging performance – in a later blog we will deep-dive on interactive performance, using the user experience techniques we discuss here).

Vtor

 

Comprehensive User Experience Monitoring

In previous posts and VMworld presentations we have talked about scaling VDI solutions, various VDI performance considerations, and bandwidth/quality tradeoffs.  In the inaugural issue of the VMware Technical Journal, we talk about the techniques we have developed to enable the automated monitoring of user experience for virtual desktop deployments (VDI), application remoting, cloud computing & even video streaming and online gaming. The paper can be found here.

VMware View 5 outperforms Citrix XenDesktop

Principled Technologies has just released an interesting whitepaper comparing the performance of VMware View 5 with Citrix XenDesktop 5.5 — the doc can be found here. In this paper, View and XenDesktop are compared using LoginVSI, for both the light and medium workloads, and both network and CPU utilization data are presented.

The key benefit of this study is it compares the two VDI technologies in an apples-2-apples manner. Out of the box, View and XenDesktop deliver different levels of desktop image quality — with XenDesktop delivering a lower image quality by default. Accordingly, for a meaningful comparison between VDI technologies, it is critical to adjust the quality settings to ensure that both VDI solutions are delivering comparable quality. In this paper, Principled Technologies, provide data for both high quality imaging (increasing XenDesktop image quality to match View's default) and lossy quality (decreasing View image quality to match XenDesktop) benchmark runs. In both these usage scenarios, View is found to achieve lower resource utilization in the majority of runs! The only exception is runs with in which XenDesktop flash redirection is enabled — Principled Technologies have not configured View video as discussed in our prior blog postings (which, in many instances, is capable of bringing the bandwidth associated with server-rendered flash content close to redirected streams).

Accordingly, the main take-away from this study is that it is important for customers to determine what desktop imaging quality their particular usage scenario demands and to configure their VDI solutions appropriately, both in any bake-off performance testing and in their final VDI deployment. 

Finally, it is important to note that this study using 3rd party workload simulation tools, closely matches the findings that we presented at VMworld 2011 using VMware View Planner.

VMware View 5 resource optimization

In last week’s post, we discussed 4 simple settings that we have observed deliver significant resource savings, while preserving user experience for typical desktop users. While we discussed the benefits of each setting in isolation, I just wanted to illustrate the overall gains. For runs using View Planner (which simulates a typical office user, with MS Office apps, browsers, Adobe reader, video playback, photo albums etc – more details can be found here), we observe a significant reduction in bandwidth when these 4 resource control settings are applied in unison:

View-bw

From the above plot it is apparent that the bandwidth reductions resulting from i) disabling build-to-loss, ii) setting the maximum frame rate to 15, iii) setting maximum audio bandwidth to 100, and iv ) performing simple in-guest operations (such as selecting “optimize for visual performance”  and disabling ClearType) are mainly additive, and the cumulative benefit is pretty substantial – around a 1.8X reduction from the default! [Particularly compelling, given that for typical office users there is very little difference in user experience]

4 simple resource optimizations for VMware View 5

By default the VMware View PCoIP protocol dynamically optimizes for the best user experience for the given resource constraints. In the majority of environments, this is the desired approach. However, there can be times where individual users or group administrators are interested in different resource utilization policies and in past blogs and whitepapers we have discussed in detail how to configure PCoIP to optimize for constrained resource consumption. In this post, I just wanted to provide a concise summary of these recommendations by highlighting 4 simple optimizations that our extensive internal testing has shown yield significant benefits:

  1. Disable build-to-lossless: setting enable_build_to_lossless to 0 delivers about a 1.3X reduction in bandwidth for typical office workloads. And, PCoIP still builds to a high quality lossy image that is virtually indistinguishable from fully lossless for office workloads.
  2. Optimize video frame-rate: setting maximum_frame_rate to 15 reduces video bandwidth by almost 1.7X in many situations, yet continues to deliver a smooth motion experience.
  3. Optimize audio bandwidth: setting audio_bandwidth_limit to 100 reduces audio bandwidth by around 5X, while continuing to deliver good quality sound.
  4. In-guest optimization: setting Windows visual settings to "optimize for performance" reduces bandwidth by over 1.1X for typical office workloads. Additionally, disabling ClearType reduces bandwidth by a further 1.05X. Disabling desktop wallpaper, and setting the screen saver to none, can also deliver bandwidth savings, although the new client image caching support in View 5 often significantly reduces the additional bandwidth traditionally associated with these options. Finally, disabling Windows update, Super-fetch and Windows index significantly reduces redo-log growth, minimizing storage requirements. Full details of in-guest optimizations can be found here.

[N.B. the PCoIP settings can be set via the Windows registry, or via GPO.]

These simple changes significantly decrease bandwidth consumption, increase consolidation ratios, have minimal impact on typical user experiences and represent good defaults in many environments.

VMware View 5 Network Optimization

PCoIP is an adaptive protocol that works to deliver the best possible user experience for any given network and CPU constraints. In the majority of environments, this is the desired approach. However, there can be times where individual users or group administrators are interested in different resource utilization policies. For instance, administrators may not want users consuming too much corporate LAN bandwidth streaming youtube videos!

The View PCoIP protocol provides a number of options that can be used to impose these constraints on audio and video streaming operations, while only having a minimal impact on quality:

Video

Setting the maximum frame rate to 15 and the maximum initial image quality to 70 or 80, can reduce the bandwidth associated with video playback by 2 to 4X in the LAN environment. Even with the maximum initial image quality reduced to 70, image quality is good, even for high quality mp4 videos.

Audio

Setting the session audio bandwidth limit to 100 can reduce the audio bandwidth by around 5X. Even with this change, audio quality is good.

More details on how to apply these settings (and additional global resource constraint settings) can be found in the View 5 network optimization whitepaper (located here).

VMware View 5.0 Performance and Best Practices White Paper

Following up from our presentation’s at the recent VMworld discussing VMware View 5.0, we are pleased to announce the availability of our VMware View 5.0 Performance and Best Practices white paper. This paper highlights;

-          Optimizations to the View PCoIP protocol that deliver bandwidth savings of up to 75% and improved consolidation ratios of up to 30%

-          Optimizations to VMware vSphere 5.0 that benefit VDI deployments

-          Performance comparisons with View 4.5, Microsoft RDP and Citrix XenDesktop

-          Best practices & tunables on the platform, guest, PCoIP protocol & network, illustrating how users can optimally configure their View deployments

The full white paper can be found here.