Uncategorized

PCoIP Improvements in VMware View 5.0

PCoIP is VMware View’s VDI display protocol and one of its key repsonsibilities is capturing the remote desktop’s AV output and conveying it to the user’s client deivce.   With VMware View 5.0 we introduce a variety of important optimizations to the PCoIP protocol that deliver a significant reduction in PCoIP’s resource utilization, benefiting users in almost all usage scenarios. Broadly speaking these optimizations fall into two broad categories, bandwidth optimizations and compute optimizations, which are now discussed in more detail.

Bandwidth Improvements

Controlling network bandwidth utilization is obviously a key consideration for VDI display protocols. This is especially true in the WAN environment, where network bandwidth can be a relatively scare and highly shared resource. View 5.0 makes significant improvements in the efficiency with which PCoIP consumes this resource, while maintaining user experience. In many typical office/knowledge worker environments, bandwidth consumption is reduced by up to 75% (a 4X improvement). In the following section, the optimizations that deliver these gains are discussed.

Lossless codec

In the VDI environment, a users’ screen is frequently composed of many forms of content, including icons, graphics, motion video, photos and text. It is the responsibility of the VDI display protocol to actively monitor the type of content the user is viewing and dynamically manage the compression algorithms utilized for each screen region to ensure the best user experience. For instance, naively applying lossy compression techniques to text-orientated content can result in blurred text edging, which can be very noticeable to users. Accordingly, PCoIP uses an efficient lossless compression algorithm that has been developed with text compression as a key consideration in order to minimize both bandwidth and CPU utilization.

With View 5.0, PCoIP debuts a major enhancement to its lossless compression codec. The improved lossless compression algorithm delivers both greater compression ratios and improved robustness. As an example, the improved algorithm delivers twice the compression of its predecessor when applied to content containing anti-aliased fonts.

If you consider the desktop belonging to the typical knowledge worker there’s frequently significant text content – text on web pages, emails, presentations and PDF documents. Accordingly, a significant proportion of the imaging data being transmitted to the client device is frequently compressed using lossless compression algorithms. As a result, View 5.0’s improved lossless compression algorithm delivers a 30% to 40% reduction in bandwidth consumption for typical knowledge worker workflows.

Client-side image Caching

Amongst its many responsibilities PCoIP is tasked with efficiently communicating desktop screen updates to the client device for local display. In many instances, only a small region of the screen may change. VDI protocols such as PCoIP perform spatial filtering and only send information related to the portion of the screen that changed (rather than naively sending the entire screen). However, in additional to spatial filtering, temporal analysis can also be performed. For instance, consider minimizing an application, dragging a window, flicking through a slide-set or even scrolling through a document. In all these examples, each successive screen update will be largely composed of previously seen (potentially shifted) pixels. As a result, if the client device maintains a cache of previously seen image blocks, PCoIP can deliver significant bandwidth savings by merely encoding these portions of the screen update as a series of cache indices rather than retransmitting the blocks.

View 5.0 introduces a client-side image cache, providing bandwidth savings of 30% in many instances (typical knowledge workers flows). This cache is not merely a simple fixed position cache, but captures both spatial and temporal redundancy in the screen updates.

Total Bandwidth Improvements

In combination the compression improvements and image caching deliver bandwidth savings of around 60% (a 2.5X improvement) out-of-the-box in both LAN and WAN use cases for typical knowledge workers.

Additional bandwidth improvements can be obtained in View 5.0 by leveraging the new image quality controls that have been introduced. By default, PCoIP will build to a lossless image – when a screen update occurs, PCoIP will almost immediately transmit an initial image for display on the client. In rapid succession PCoIP will continue to refine the client’s image until a high quality lossy image is achieved. In PCoIP vernacular, this is termed building to a “Perceptually lossless” image. If the screen remains constant, PCoIP will, in the background, continue to refine the image on the client until a lossless image is obtained (i.e. PCoIP builds to lossless (BTL)). In certain application-spaces building to a lossless image is a key feature. However, for many knowledge workers, the BTL support can be disabled without impact on image quality. And disabling BTL can deliver significant bandwidth savings — in many situations disabling BTL will provide up to around 30% reduction in bandwidth.

Combining the compression improvements, client caching and disabling BTL commonly delivers a bandwidth improvement of up to 75% (a 4X improvement), for typical office workloads!

CPU Improvements

In VDI environments, desktop consolidation is a key consideration. The more user desktops that can be handled per system (i.e. the higher the consolidation ratio), the better the cost savings that can be realized. Accordingly, the CPU overheads introduced by the VDI protocol must be carefully constrained. With View 5.0, PCoIP has been further enhanced to minimize its CPU overhead in a number of significant ways.

Idle CPU usage

From the VDI protocol’s perspective, unless the user is viewing a video, the user is idle for a large proportion of the time. For instance, if a user loads a new web page, there is a flurry of activity as the web page loads and the screen update is displayed, but many seconds or even minutes may elapse with the screen remaining static, as the user reads the content of the page. For a VDI protocol, it is not only important to encode any screen changes efficiently, but to minimize the overheads associated with all background activities that occur during these idle periods.

With View 5.0, we have significantly optimized these code paths, and PCoIP’s idle CPU consumption is now negligible. Further, the session keep-a-live (aka heartbeat) bandwidth has been reduced by a factor of 2, for many workloads.

Optimized algorithms and code

In View 5.0, many of the hottest image processing and compression functions have been reexamined, their algorithms tweaked for efficiency and their implementation further optimized – in some situations, even coded in assembly to realize the absolute lowest computational overheads.

Effectively using Hardware Instructions

Image manipulation operations are typically suitable to acceleration via the use of SIMD (Single Instruction Multiple Data) instructions, such as the SSE instructions supported on recent x86 processors. With View 5.0, PCoIP has been optimized to take even greater advantage of the SSE SIMD support available on x86 processors, not only providing an expanded coverage of the code base, but also, when available, leveraging the SSE instructions available on the very latest processors (e.g. SSE 4.2 and AES-NI).

Conclusion

In conclusion, with the introduction of View 5, we have spent significant time further optimizing PCoIP to furher reduce both its bandwidth and CPU consumption, delivering improved responsiveness, improved consolidation ratios and improved WAN scalability.