ESX scheduler support for SMP VMs: co-scheduling and more

ESX supports virtual machines configured with multiple virtual CPUs (for example, ESX 3.x supports up to
4 vCPUs). Handling mixed loads of uniprocessor and multiprocessor VMs can be challenging for a
scheduler to get right. This article answers some common questions about deploying multiprocessor VMs,
and describes the algorithms used by the ESX scheduler to provide both high performance and fairness.

When considering multiprocessor VMs, the following questions naturally arise for ESX users:

a) When do I decide to configure multiple vCPUs for a VM?
b) What are the overheads of using multiprocessor VMs? What would I lose by over provisioning vCPUs 
     for VMs?
c) Does the ESX scheduler (co-schedule) all of the vCPUs belonging to a VM together?
d) Why is co-scheduling necessary and important?
e) How does ESX scheduler deal with certain vCPUs belonging to a VM idling while others actively
     perform work? Do the idle vCPUs unnecessarily burn CPU?

Let’s answer these questions briefly:

a) It makes sense to configure multiple vCPUs for a VM when:
    1. The application you intend to run within the VM is both multi-threaded (Apache Web Server, MS
         Exchange 2007, etc) and these application threads can indeed make good use of additional
         processors provided (multiple threads can be active and running at the same time). 
    2. Multiple single threaded applications are intended to run simultaneously within the VM.   

     Running one single threaded application within a multiprocessor VM will not improve performance
     of that application, since only one vCPU will be in use at any given time.  Configuring additional 
     vCPUs in such a case is unnecessary.   

b) It’s best to configure as few virtual CPUs as needed by the application to handle its load. In other
    words, don’t overprovision on the vCPUs if not needed for additional application performance.

    Having virtual machines configured with virtual CPUs that are not used does impose resource
    requirements on the ESX Server. In some guest operating systems, the unused virtual CPUs still
    take timer interrupts which consumes a small amount of additional CPU. Please refer to KB
    articles 1077 and 1730.

c) For scheduling a VM with multiple vCPUs, ESX 2.x used a technique known as ‘Strict Co-scheduling’. 
    With strict co-scheduling, the scheduler keeps track of a "skew" value for each vCPU. A vCPU’s skew   
    increases if it is not making progress (running or idling) while at least one of its vCPU sibling is
    making progress.

   When the skew for any vCPU in a VM exceeds a threshold, the entire VM is descheduled. The VM is   
   rescheduled only when enough physical processors are available to accommodate all of the VM’s vCPUs.
   This may, especially with a system with fewer cores and running a mix of UP and SMP VMs,  lead to
   CPU ‘fragmentation’ resulting in relatively lower overall system utilization. As an example consider a
   two core system running  a single UP and a single two vCPU SMP VM. When the vCPU belonging to the
   UP VM is scheduled the other physical processor cannot be used to execute one of the two vCPUs of
   SMP VM, leading to the other physical CPU idling for that length of time.

   This co-scheduling algorithm was improved to a ‘Relaxed Co-Scheduling’ scheme in ESX 3.x. wherein
   even on availability of fewer physical processors than vCPUs in a skewed VM  only vCPUs that are 
   skewed need to be scheduled. This scheme increases the number of scheduling opportunities available
   to the scheduler and hence improving overall system throughput. Relaxed co-scheduling significantly
   reduces the possibility of co-scheduling fragmentation, improving overall processor utilization.

d) Briefly co-scheduling (to maintain the skew between processors execution times within reasonable
    limits) is necessary both so that the guest operating system and the applications with them run
    correctly and with good performance. Significant skew between the vCPUs corresponding to a VM can
    result in both severe performance and correctness issues.

    As an example guest operating systems make use of spin locks for synchronization. But if the vCPU
    currently holding a lock is descheduled, then the other VCPUs belonging to the VM will burn cycles
    busy-waiting until the lock is released. Similar performance problems can also show up in
    multi-threaded user applications, which may also perform some form of synchronization. Correctness
    issues associated with significant skew between the vCPUs of a VM can cause Windows BSODs or Linux
    kernel panics.

e) Idle vCPUs, vCPUs on which the guest is executing the idle loop, are detected by ESX and descheduled
    so that they free up a processor that can be productively utilized by some other active vCPU. 
    Descheduled idle vCPU’s are considered as making progress in the skew detection algorithm. As a
    result, for co-scheduling decisions, idle vCPUs do not accumulate skew and are treated as if they were
    running . This optimization ensures that idle guest vCPUs don’t waste physical processor resources,
    which can instead be allocated to other VMs.  For example, an ESX Server with two physical cores may
    be running one vCPU each from two different VMs, if their sibling vCPUs are idling, without incurring
    any co-scheduling overhead.  Similarly, in the fragmentation example above, if  one of the SMP VM’s
     VCPU is idling, then there will be no co-scheduling fragmentation, since its sibling vCPU can be
     scheduled concurrently with the UP VM.

To summarize ESX scheduler supports and enables SMP VMs for both high performance and fairness. ESX
users should leverage this SMP support for improving the performance of their applications by
configuring the appropriate number of vCPUs for a VM as really needed by the application load.

For a broader technical overview on ESX co-scheduling algorithms described above, please also refer to
the “Co-scheduling SMP VMs in VMware ESX Server“ blog.


6 comments have been added so far

  1. Excellent post – the only other consideration (out of scope of the topic, but important to the topic) is the impact that vCPUs have on VM HA slot calculations. Using vCPUs rapidly changes the ESX cluster availbility constraints (this is good design, good behavior, but something end-users need to consider as they consider vCPU use)

  2. Is there any plan to increase SMP greater than 4? I am doing some testing trying to move an application from physical to virtual The application has a SQL back end and there are comparisions (jobs) that get done. The software allows you to specify how many processors you want to use for those comparisions. Using a Dell 2950(dual processor quad core Intel Xeon 2.66) physical server allows me to use 6 processors for comparisions while leaving 2 cores for OS and SQL and this configuration results in around 3 minute comparison times.
    Well I virtualized that server (2 vCPU) and then another server (4 vCPU) using a Dell 2950. The software also allows for you to use processors from another machine to decrease the comparison times. So as a test i used one virtual server for 2 CPUS and the other for 4 CPUs giving me a total of 6 CPUS (like the physical) to do the comparisons. My comparison times are around 15-45 seconds longer on the virtual servers then the physical servers. I also tried using 4 vCPU on both servers so the server with the application would have 2 vCPUs for OS and 2 VCPUs for comparisons, while the other 4 vCPU server used all 4 vCPUs for comparisons which is still the same total of 6 vCPUs to process comparisons. The results were the same as the configuration of 2 vCPUs in one and 4 vCPUs in the other.
    Maybe with a 6 or 8 way SMP these ID times on the virtual servers will go down. Any suggestions on increasing performance on the virtaul side to at least get around the same comparison times as the physical?

  3. Good post indeed – makes it easier to get an overview of the process behind, thereby increasing the chance of success.
    Also, Jason Boche has a very good point indeed!
    Chad: First of all, going from Physical to Virtual, but on the same HW will not produce the same results. First of all because there IS indeed an overhead loss in terms of RAW CPU power available. Second of all because you have changed the entire deal, by being dependant of a networked “calculation cluster”. I don’t know how your application does this over the network, but I would guess that there is indeed a penalty running 2 hosts with 4 cores each over the network, vs. 1 host running 8 cores on the same box, being physical OR virtual.
    The physical setup you describe also has 2 cores less then the first virtual example, giving less power available overall no matter what.
    The second virtual environment gives no better results, despite the equalization of processorpower between the physical and virtual – I would account for that being the above two problems:
    1) Virtualization overhead.
    2) Network latency.
    On the whole subject of having ALOT of vCPU’s in virtual hosts, well I ALWAYS have 2 vCPUS no matter what. Mainly because the physical HW usually having 16 cores available in the systems I use (ty for Quadcore!), makes sure I usually run out of memory before Cores. When having 2 CPU’s in one box, it mimics any physical environment – ask a serveradmin anyday if he would accept a single CPU box for anything serious in his network – the answer would be NO! Usually because we want a userapp run smoothly at the same time we do all kind of other tasks on the server to keep it running (patching, backup, AV scan, etc. etc.). Also most serious apps need to have have more horses to pull the load to perform well…
    I have acctually experienced clients asking for virtualization on some VERY CPU hungy systems, even though the economy doesn’t quite hold up. This is caused by the advantage of having things virtualized – HA and the detatchment from HW (Firmwareupgrades cost no downtime because of VMotion and no reinstallation needed in case of a HW/Platform upgrade to a newer and faster x86/x64 platform). I’ve even seen an ESX to VM ratio of 1 to 1 on occation.
    Oh, and expect to see 8 or more vCPU configurations available soon from VMware – wouldn’t imagine anything else. 😀

  4. oh, and I actually thought that the “strict scheduling scheme” of ESX2.x was still in use on ESX3 – didn’t know they had improved on it.
    You learn something everyday! 😀

Leave a Reply

Your email address will not be published. Required fields are marked *