By Hans Bader
So, your company has bought a new set of hardware, referenced the latest white papers and reference architectures, and now will get the virtual machine (VM) densities promised, right? Well, maybe not. White papers and reference architectures are great starting points for designing and building your environment, but unless you are running the same workloads, your mileage may vary. The key to knowing what your infrastructure will support is to proactively perform load testing – before going into production.
Successful load testing is a considerable amount of work; it involves creating synthetic workloads, and understanding the metrics and the impact on the end-user experience. Holistic load testing will bring in different teams: storage, networking, compute, application development, software distribution and virtual infrastructure. Each of these teams has a stake in ensuring a good end-user experience.
Manage, Understand, and Set Expectations
Understand that the performance of a virtual desktop is all about the performance the end user (your customer) is seeing and perceiving. Gathering all the metrics from VMware vCenterTM, PCOIP logs and storage IOPS are all important, but ultimately it is the end-user’s perception and experience that is most important. It is easy for an administrator to say, “The VM has 2 GB out of 4 GB of memory free,” but if the user is experiencing poor performance due to network contention, the end user is still unhappy.
You must set the proper expectations and understand what you can test. Generating CPU and memory load inside the guest is relatively easy with tools such as Iometer. Iometer does a great job of generating compute load, but does not provide any user experience metrics. With remote desktops the challenge becomes testing PCOIP and client-desktop communication.
Have Your Plan in Place
Have your testing methodology, objectives and metrics documented in advance. It is important to develop your test design before starting the actual load testing process. Think it through completely; map the information flow for the entire load test process, entry points and process dependencies. If you are going to create a view pool of 1,000 desktops, will the LAN segment where you will be creating the desktop have enough IP addresses available? Do you know that anti-virus updates are a known pain point? Include these in your testing scenarios. Also include software updates if applicable.
Understand what is going to be tested and how the testing will impact end users. The end-user experience with virtual machines is more than just performance graphs of the VM in your vCenter inventory. Are you testing a local install of Microsoft Word, or a larger client-server based application? Many of the applications running in a virtual desktop are dependent on systems (databases, web services, etc.) that exist outside the desktop. Do you have an information flow diagram that shows all the systems an application may interact with? Do you know where the choke points are? Adequate desktop resources are not sufficient if you are load testing 1,000 desktops running a CRM application – but the environment can only scale to 750 users.
Your End Users Can Help You
During testing do not rely solely upon metrics: your testing must include “eyes on the glass.” Have actual users run through the test scenarios to understand how—as the load increases—the user experience may be impacted. An end user can establish what a good baseline is, what acceptable performance is, and when the end-user experience starts to degrade. These subjective user perceptions can be roughly mapped to network metrics, storage latency or memory usage.
Documented Test Plans
Leverage existing test plans where possible. Many times there are existing test plans for applications that have been developed in-house. These are company- specific and require domain subject matter experts to create and execute on. Utilizing these people can decrease the time and effort required to create and document your current test plans.
Test What is Real
This very important concept is often overlooked. Don’t simply consider CPU and memory consumption of a virtual machine. Running CPU Busy and generating 100 percent CPU usage inside a VM is not realistic. To generate accurate user experience loads you must use appropriate tools, such as:
Proper load testing of your new environment means testing both your architectural and physical designs. It is important to understand how the user load may impact your initial physical design. The number of hosts per cluster, desktops deployed per data store, and network connectivity all come into play. You may find you have been overly conservative in your resource assumptions; but you can change your cluster sizing and therefore obtain greater desktop densities.
During your load testing, use this time to understand the impact on typical administrative tasks while running the hosts. For example, how long does it take to spin up a new pool of 500 desktops when you are running a load test with 1,000 desktops? Or how long does it take to put a host in maintenance mode when it has 80 desktops running? The outcomes of these ancillary tests may change the way you administer your environment.
Expose the Weak Links
What if, during your load testing, you break something? Perhaps you’ll run out of DHCP addresses, the KMS server and your hosts start swapping, LUNS run out of space, and VMs crash. These events should not be considered failures, but rather successful tests. These events show you where to focus attention prior to the next load test so real users do not experience these problems during live operations. Yes, load testing can be a lot of work, and take a considerable amount of effort to do effectively, but the end results are worth it: end users and administrators are happy.
Plan for Remediation
Exposing a weak link during load testing is not a failure, but a positive result. You should ensure your testing plan has time built in to address any weaknesses that are uncovered or that you may have time to test again. The amount of time that has to be added depends on the amount of load that broke the system. If load testing early on with fewer users exposed a lack of DHCP addresses this is a relatively easy fix to a DHCP scope. On the other hand, if testing at full predicted load uncovered a storage performance bottleneck, the time to procure additional storage, install and configure could be much longer.
Your first fully automated test should be a single system test—a single test to ensure your test plan runs through to completion. With no resource contention and no over-commitment on the hosts, this is your baseline. This should also be correlated with an actual user single system test, ensuring the user experience is what is expected.
For the second test, ramp up to 50 percent of what the calculated capacity is. This gives enough wiggle room so you can determine if your design assumptions are accurate. Do you have enough IP addresses? Is storage able to keep up? How are the memory stats?
Run a third test at 100 percent calculated capacity. This is where getting real users into the system is critical. How long does it take to login? Are the test scenarios within the acceptable parameters? Is the user experience acceptable? Have you met all your design criteria and business requirements?
Finally, a fourth test at more than 100 percent expected capacity should be run. Add more desktops, start a full anti-virus scan, perform a software update. No matter how well we design, we always have to plan for the worst-case scenarios. The unexpected removal of a host from a cluster dramatically impacts capacity. Put a host in maintenance mode or reboot it without putting it in maintenance mode. How does your environment perform under these extreme conditions?
“We must contemplate some extremely unpleasant possibilities, just because we want to avoid them.”
– Albert Wohlstetter, American nuclear strategist, 1960
For more information, be sure to check out the following VMware Education Courses:
Hans Bader Consulting Architect, VMware EUC. Hans has over 20 years of IT experience and joined VMware in 2009. With a focus on helping organizations being operationally ready, he works with customers to avoid common mistakes. He is a strong advocate for proactive load testing of environment before allowing users access. Hans has won numerous consulting awards within VMware.