Home > Blogs > VMware VROOM! Blog


Virtualizing XenApp on XenServer 5.0 and ESX 3.5

There has always been interest in running Citrix XenApp (formerly Citrix Presentation Server) workloads on the VMware Virtual Infrastructure platform. With the advent of multi-core systems, purchasing decisions are driven towards systems with 4-16 cores. However, using this hardware effectively is difficult due to limited scaling of the XenApp application environment. In addition to the usual benefits of virtualization, these scaling issues make running XenApp environments on ESX even more compelling.

We
recently ran some performance tests to understand what can be expected
in terms of performance for a virtualized XenApp workload. The results
show that ESX runs common desktop applications on XenApp with
reasonable overhead compared to a native installation, and with
significantly better performance than XenServer. We hope this data will
help provide guidance when
XenApp environments are transitioned from physical hardware to a virtualized environment.

Together with partners, we have been developing a desktop workload for over a year. The workload has been tested extensively on virtual desktop infrastructure (VDI) environments with one user per virtual machine (VM). VDI results have been presented and published in numerous locations (e.g., http://www.vmware.com/resources/techresources/1085, VMworld 2008 presentation VD2505 with Dell-EqualLogic). Great attention was paid to selecting the most relevant applications as well as to specifying the right types and amount of work each should do. Many other Terminal Services-style benchmarks fail to be representative of actual desktop users. Porting the workload from a VDI environment to the XenApp environment was straightforward.

XenApp was run in a single 14 GB 2-vCPU Virtual Machine (VM) booted with Windows Server 2003 x64. The hypervisors used were ESX 3.5 U3 and XenServer 5. The VMs for both had the appropriate tools/drivers installed. The XenServer VM had the Citrix XenApp optimization enabled. For comparison, the tests were run natively with the OS restricted to the same hardware resources. The hardware is a HP DL585 with 4 quad-core 2210 MHz “Barcelona” processors and 64 GB memory. Rapid Virtualization Indexing (RVI) was enabled.

The test consists of 22 operations, always executed in the following order:

IE_OPEN_2

Open Internet Explorer

IE_ALBUM

Browse photos in IE

EXCEL_OPEN_2

Open Excel file

EXCEL_FORMULA

Evaluate formula in Excel

EXCEL_SAVE_2

Save Excel file

FIREFOX_OPEN

Open Firefox

FIREFOX_CLOSE

Close Firefox

ACROBAT_OPEN_1

Open PDF file

ACROBAT_BROWSE_1

Browse PDF file

PPT_OPEN

Open PowerPoint file

PPT_SLIDESHOW

Slideshow in PowerPoint

PPT_EDIT

Edit PowerPoint file

PPT_APPEND

Append to PowerPoint file

PPT_SAVE

Save PowerPoint file

WORD_OPEN_1

Open Word file

WORD_MODIFY_1

Modify Word file

WORD_SAVE_1

Save Word file

IE_OPEN_1

Open Internet Explorer

IE_APACHE

Browse Apache doc in IE

EXCEL_OPEN_1

Open Excel file

EXCEL_SORT

Sort column in Excel

EXCEL_SAVE_1

Save Excel file

 

A “sleep” of random length is inserted between each operation to simulate user think time. One execution of the whole set of operations is called an “iteration” and takes about 57 minutes. Several of these operations consist of many sub-operations. For instance, the PPT_SLIDESHOW operation consists of 10 sub-operations where each displays a slide in a PowerPoint document followed by a pause. Only the latency to display the slide is timed, and not the time spent sleeping. The latencies of the sub-operations are summed to give the operation latency, and all the operation latencies within one iteration of one user are summed to yield the “total latency”. AutoIt3, an open-source scripting language, is used on the server side to automate the operations. CSTK Client Launcher (a utility that allows the tester to create and launch multiple ICA client sessions) is used on a client machine to start the users (sessions). Each user is started in a staggered fashion so that the last user is starting when the first user is close to finishing its first iteration. This strategy avoids synchronizing the execution of any operation across users. Each user runs six iterations. The “average total latency” is determined by averaging all the total latencies across the middle four iterations (i.e., the ones where all users are running at steady state), and across all users. Note that it is important to time many different kinds of desktop applications: timing just a few operations (or even just one as has been done in other publications) can give a very distorted view of overall performance. With a similar philosophy we gather CPU data over nearly four hours of steady state to ensure the utilization statistics are solid. The first figure shows the average total latency as a function of the number of users for XenServer, ESX, and Native.

Blog_latency2

The two horizontal lines labeled “QoS” denote the Native latency for 35 and 38 users. Either of these may be considered as a reasonable maximum Quality of Service for latency. They correspond to somewhat less or more, respectively, of half of the available CPU resources (see the CPU figure below), which is a commonly used target for XenApp. At higher utilizations not only does the latency increase rapidly but operations may start to fail. We required that all operations succeed (just like a real user expects!) for a test to be deemed successful. The points where the QoS lines cross the ESX  and XenServer curves gives the number of users that can be supported with the same total latency. Normalizing with the number of Native users (35 or 38) gives the fraction of Native users each virtualization product can support at the given total latency:

Blog_users2

ESX consistently supports about 86% of the native number of users, while XenServer supports about 77%. Shown below is the average CPU utilization during the second to fifth iteration of the last user, given as a percentage of a single core. Perfmon was used for Native, esxtop for ESX, and xentop for XenServer. ESX uses less CPU than XenServer no matter how the comparison is made: for a given number of users, or for a given total latency:

Blog_cpu2

XenApp and other products that virtualize applications are prime candidates to be run in a VM. These results show that ESX can do so efficiently compared to using a physical machine. This was shown with a benchmark that: represents a real desktop workload, uses a metric that includes latencies of all operations, and requires that all operations complete successfully. Furthermore, ESX supports about 13% more users than XenServer at a given latency while using less CPU.

16 thoughts on “Virtualizing XenApp on XenServer 5.0 and ESX 3.5

  1. Tarry Singh

    Playing the devil’s advocate here clearly native works far better than ESX or XenServer. Why on earth don’t we just leave the Citrix Farms alone? Same may apply to Oracle [RAC] and other I/O intensive apps.
    Why are we virtualizing anyways, just push them all in the Cloud and let Grid, Clusters, and all other concoctions handle the workload. Its all about the workload and SLA, right?
    Disclaimer: I love and fully propose virtualization to firms that are madly buying hardware because some moron architects and other vested political interests who want to spend the tax dollars of citizens [we're talking about Gov installations I've come across in the past 8 years of mad consulting in Virtualization in across the planet] because “the budget for this year must be finished no matter how irresponsible I will be with those tax dollars”.
    Having said that I prefer to also hold the discussions with those who just don’t get it and want to go ahead with the apps as they are. I’ve been in the industry for almost two decades and have been in the middleware before we started the virtualization game.

  2. Kenon Owens

    disclaimer, I am an employee of VMware, but the opinions in this comment are my own.
    Tarry, I think the reason people are starting to move to virtualizing their XenApp and Oracle and other applications is due to the fact that today the hardware is beginning to outpace the software. Meaning, I don’t get much major benefit going from say a 4 core system to an 8 core system (I certainly don’t get two times the performance). So as I compare like configured boxes (WRT to hardware specs), I do have some performance overhead with a virtualized solution, but on a current architecture box, I can get more VMs that can do different things on the same box and am not wasting resources.

  3. Jeff Buell

    Duncan,
    TPS? The test used only about 7 GB, so none of the platforms were under memory pressure.
    Tarry,
    Kenon is right. Many people don’t even use more than 2 CPUs for XenApp. So even with some virtualization overhead, you’ll come out way ahead by running several XenApp VMs on a single box.

  4. Bryan Salek

    And let’s not forget that virtualization is not just about consolidation. Virtualization is a valuable mechanism to increase availability, perform workload leveling, enable rapid scale out, and simplify disaster recovery. I’ve seen customers virtualizing workloads that utilize an entire physical host for just these reasons.

  5. Jack

    Although VMWare still holds overall a better performance than XenServer and HyperV when speaking about regular server virtualization, this is not true concerning XenApp servers. All the papers I have read so far from independant vendors/companies demonstrated that. You can check for example the papers written by http://www.virtualrealitycheck.com. We’ve made two tests with customers of ours and each time we managed to put more users on the XenServer machine than on the VMWare one.

  6. Robert

    I can tell you that these results are exactly opposite of what we are seeing in production environments. XenServer far outperforms ESX, even for MSOffice users.

  7. Pulporange

    I think there is a lot of improvment for vmware since they support the RVT (AMD) feature for huge memory applications.
    As everyone know XenApp and W$ are using lots of memory calls …
    qed

  8. Michael

    I confirm Nico’s comment – we did production testing with users simultaneously accessing XenApp as they do in the real-world (i.e. not scripting one test after one completes as in your scenario) showed that XenServer *far* outpaced ESX… it wasn’t even close even after patching/tuning. I don’t deny you captured these results but the methodology is flawed or purposely devised. I’m a VMW biggot but our testing came to be from collegues at two different organizations who confirm similar results. We still like ESX and I don’t see it going away anytime soon but XenServer is, simply put, the better choice for XenApp.

  9. Kevin

    We have done our own testing similar to this just with ESX vs. native and our results were very different. In our test the physical server (native) far far far out performed the single VM. To me this data showing a single VM and native being so close is laughable.

  10. Simon Bramfitt

    Considering the fact that VMware insist on reviewing test methodologies before anyone publishes test results for ESX performance; did the testers consult with Citrix and Microsoft before releasing these results?
    Thought not.

Comments are closed.