Role of vCenter in VMware View Scalability
By: John Dodge, Director, VMware End User Computing
VMware View Large Scale Design Series, Episode 2: Core View Infrastructure. The second in a series of posts enabling large scale VMware View designs and deployments. Future posts will look at Security Servers, vCenter server, View Composer, Connection Server component interaction, and more.
In Demystifying Large Scale View Designs (http://blogs.vmware.com/euc/2012/05/demystifying-vmware-view-large-scale-designs.html) I briefly discussed the role of vCenter in scaling View designs. In this post I will explore this topic in more depth.
In my prior blog post I introduced the topic of View Blocks, explaining that View leverages very tight integration with VMware vSphere, a key differentiator of VMware View. To quickly recap, a View Block is a desktop resource boundary in a View deployment dedicated to running the highly volatile desktop workloads. A View Block is marked by a vCenter instance managing the ESXi hosts, networking, and datastores hosting View virtual desktops. I shared that our design guidelines suggest keeping the number of View managed desktops in a View Block to no more than 2,000 desktops; a guideline and not a hard limit, meaning that some implementations may have success with more and others with less.
This guideline is confusing for VCPs and other folks intimately familiar with vSphere design. vCenter version 4 and later versions support up to 10,000 virtual machines per instance, and so these folks think that 2,000 View managed desktops seems ridiculously conservative. I’ll explain that while it is conservative, there are very good reasons not to attempt to manage too many desktops in a single vCenter instance.
One administrative task, many operations
At VMworld 2012 I was fortunate to co-present with one of the senior View product managers, Lebin Cheng (see VMworld 2012 session EUC1470: Demystifying Large Scale Enterprise View Architecture). In our session Lebin shared just some of the vSphere operations behind a single administrative function, a desktop pool recompose.
Figure 1, Summary of vSphere operations during a desktop pool recompose
This is not a complete list of vSphere operations that occur in a desktop pool recompose; think of this as a “low resolution” illustration for the purpose of showing you that there are many operations that occur behind the scenes.
As you can see, there is a symphony compromised of carefully orchestrated interactions between the Connection Server, vCenter, View Composer, ESX, and the individual desktop VMs. I’d also like to point out that the sheer number of these interactions increase dramatically when we reach the right side of this illustration. There can be as many as 32 hosts in a vSphere cluster (caveats apply to View 5.1 and earlier, please refer to our excellent product guides and release notes), and each ESXi host may easily exceed 100 desktop VMs. Obviously all these operations cannot happen simultaneously, so how do we ensure all these operations are completed reliably?
End of the queue, please
Earlier in this post I mentioned that View enjoys tight integration with vCenter. The interaction between the Connection Servers and vCenter servers is the cornerstone of this integration. In a View cluster (or pod) a single Connection Server is delegated the responsibility of vCenter interaction. In a large View Pod there may be several vCenter servers managed by the View Pod, in this case there may be more than one Connection Server with a delegated vCenter management role. When the desktop pool deployment task commences, the Connection Server begins to send multiple requests to vCenter, and these requests are queued in the order received, and in turn vCenter will send multiple operations to multiple ESXi hosts which in turn add these operations to the queue in the order received. That’s a lot of queuing!
It is critically important to understand this tiered orchestration when designing large-scale View implementations. Latency in these operations is detrimental in operating View in environments where users want their desktops and software when they need them (which is “right now” of course). Therefore maintaining operational concurrency and keeping certain queues to a minimum is important for completing administrative tasks quickly and avoiding timeouts that are inevitably if there are too many operations queued at once. To a View administrator, this long list of queued operations translates into a longer maintenance window.
Figure 2, Key concurrent tasks
For the sharp-eye View administrators, you know that the View admin GUI has a page for specifying concurrent operational limits (View Configuration>Servers>Edit vCenter Server>Edit button) that looks like this:
Tempting, these settings are! Be warned, though: VMware QA testing has found that setting these values too high has a detrimental effect; in some cases worse than a value set too low.
Power operations are important in regular View usage and provisioning operations. For example, if a desktop is powered off when a user attempts to connect to it, View powers on the VM to satisfy the connection request. The maximum number of concurrent power operations is governed by the Max concurrent power operations setting for each vCenter server configured in View. Starting in View 5.0, this limit is set to 50 by default.
The required number of concurrent power operations is based on the peak rate at which desktops are powered on and the amount of time it takes for the desktop to power on, boot, and become available for connection. The recommended power operations limit is the total time it takes for the desktop to start multiplied by the peak power-on rate. For example, if the average desktop takes between two to three minutes to start, and you set the concurrent power operations limit to 3 times peak power-on rate, then default setting of 50 will support a peak power-on rate of 16 desktops per minute. Note that you can have a dramatic impact on this setting by reducing boot up time by optimizing Windows.
VMware View waits a maximum of five minutes for a desktop to start. If the start time takes longer, other errors are likely to occur. To be conservative, you can set a concurrent power operations limit of 5 times the peak power-on rate. With a conservative approach, the default setting of 50 supports a peak power-on rate of 10 desktops per minute.
If the concurrent power operations value is set too low, then the VMware View Clients are likely to display the following error message: “The assigned desktop source for this desktop is not currently available. Please try again later.” In addition, the following message appears in the View Connection Server DEBUG logs: “Limits exceeded for operation on <VMname> lockId: <Managed object ID>, queuing for later”.
In order to gain the most from the concurrent power provisioning setting, you should first calculate average desktop boot up time from power off, then determine if you have sufficient IOPS overhead for either an aggressive or conservative value. You can determine this by monitoring your storage subsystem utilization during VM boot up. If your storage can handle starting more desktops simultaneously, you can increase the concurrent power operations value.
There is some value to be gained by tweaking the other options. Max concurrent power operations is optimally set to 50 by default, and the other two options, Max concurrent View Composer Maintenance operations, which cover refresh, recompose, and reblance operations, and Max View Composer provisioning operations, which covers linked clone provisioning, are set conservatively low—but for a good reason: If set too high, these operations have a strong likelihood of exceeding the maximum IOPS throughput of your SAN. This can have an unintended (and nasty) impact of crashing the SAN, or at the very least can overrun your storage cache and slow down SAN read and write operations to a crawl.
The best practice for adjusting these settings is to take a measured approach of first baselining the operation you’re hoping to improve, increasing one value at a time in a small increment, and validating results. You then iterate through this process several times until you reach at plateau or drop-off in performance gains. Bottom line: never make dramatic increases to the default values without testing! (Unless you have a beef with your storage admin, or find user downtime thrilling)
Scale out to scale up
In VMware’s design recommendations for View 5.1 and earlier, we provide the guidance that you should add a vCenter server per 2,000 desktops. We refer to the boundary of desktop VMs managed by vCenter as a View Block. The reason for such a conservative guideline (conservative when compared to the vCenter maximum supported VM count of 10,000) is due to the way View queues operations to vCenter and ESXi, as I mentioned previously. You can achieve some admirable results by tweaking the vCenter concurrency settings in the View admin GUI, in fact I spoke of a Lighthouse customer achieving a 97% reduction in recompose time in my VMworld session EUC 1430—Demystifying Large Scale View Deployments, who reduced recompose time of 3,000 desktops down from 100 hours to just 7.5 hours. But despite the potential for improvement within a vCenter server, there is significantly more concurrency to be achieved in subdividing your environment into smaller View Blocks, and tweaking the settings for each vCenter server individually. For example, if you find provisioning or recompose time unacceptably high after tweaking and validating concurrency settings, you could subdivide your 1,500 desktop View Block into two 750 blocks, even though you haven’t yet reached the 2,000 desktop recommendation. If you purchased the View Premier license, the licenses for the vCenter servers managing the View desktops are included, even if you plan to split your View pod into multiple blocks, though to be fair you may incur additional Microsoft licensing fees; not every Microsoft licensing arrangement is the same so please speak to your Microsoft licensing specialist to be sure.
In the next release of View I expect to see concurrency of vCenter operations to increase dramatically, decreasing provisioning and rebalance time. The immediate tangible benefit of these improvements is to reduce the overall number of vCenter servers per large scale View Pod. Also, I expect that in the next release we’ll support using the vCenter Server Virtual Appliance (vCSA) in production—a huge benefit to customers looking to lower their Microsoft licensing burden. Watch this space for more details, as I’ll be explaining the improvements we’ve made in the next release.
Our engineering team recognizes that our customers are deploying ever larger and larger View Pods, and a few of our customers are deploying several very large pods. Increasing concurrency and consolidating management of several large pods into one view are critical roadmap items you should expect to see in upcoming releases of View. Keep an eye on http://blogs.vmware.com/euc/ for the latest information on VMware View!