Uncategorized

ESX scheduler support for SMP VMs: co-scheduling and more

ESX supports virtual machines configured with multiple virtual CPUs (for example, ESX 3.x supports up to
4 vCPUs). Handling mixed loads of uniprocessor and multiprocessor VMs can be challenging for a
scheduler to get right. This article answers some common questions about deploying multiprocessor VMs,
and describes the algorithms used by the ESX scheduler to provide both high performance and fairness.

When considering multiprocessor VMs, the following questions naturally arise for ESX users:

a) When do I decide to configure multiple vCPUs for a VM?
b) What are the overheads of using multiprocessor VMs? What would I lose by over provisioning vCPUs 
     for VMs?
c) Does the ESX scheduler (co-schedule) all of the vCPUs belonging to a VM together?
d) Why is co-scheduling necessary and important?
e) How does ESX scheduler deal with certain vCPUs belonging to a VM idling while others actively
     perform work? Do the idle vCPUs unnecessarily burn CPU?

Let’s answer these questions briefly:

a) It makes sense to configure multiple vCPUs for a VM when:
    1. The application you intend to run within the VM is both multi-threaded (Apache Web Server, MS
         Exchange 2007, etc) and these application threads can indeed make good use of additional
         processors provided (multiple threads can be active and running at the same time). 
    2. Multiple single threaded applications are intended to run simultaneously within the VM.   

     Running one single threaded application within a multiprocessor VM will not improve performance
     of that application, since only one vCPU will be in use at any given time.  Configuring additional 
     vCPUs in such a case is unnecessary.   

b) It’s best to configure as few virtual CPUs as needed by the application to handle its load. In other
    words, don’t overprovision on the vCPUs if not needed for additional application performance.

    Having virtual machines configured with virtual CPUs that are not used does impose resource
    requirements on the ESX Server. In some guest operating systems, the unused virtual CPUs still
    take timer interrupts which consumes a small amount of additional CPU. Please refer to KB
    articles 1077 and 1730.

c) For scheduling a VM with multiple vCPUs, ESX 2.x used a technique known as ‘Strict Co-scheduling’. 
    With strict co-scheduling, the scheduler keeps track of a "skew" value for each vCPU. A vCPU’s skew   
    increases if it is not making progress (running or idling) while at least one of its vCPU sibling is
    making progress.

   When the skew for any vCPU in a VM exceeds a threshold, the entire VM is descheduled. The VM is   
   rescheduled only when enough physical processors are available to accommodate all of the VM’s vCPUs.
   This may, especially with a system with fewer cores and running a mix of UP and SMP VMs,  lead to
   CPU ‘fragmentation’ resulting in relatively lower overall system utilization. As an example consider a
   two core system running  a single UP and a single two vCPU SMP VM. When the vCPU belonging to the
   UP VM is scheduled the other physical processor cannot be used to execute one of the two vCPUs of
   SMP VM, leading to the other physical CPU idling for that length of time.

   This co-scheduling algorithm was improved to a ‘Relaxed Co-Scheduling’ scheme in ESX 3.x. wherein
   even on availability of fewer physical processors than vCPUs in a skewed VM  only vCPUs that are 
   skewed need to be scheduled. This scheme increases the number of scheduling opportunities available
   to the scheduler and hence improving overall system throughput. Relaxed co-scheduling significantly
   reduces the possibility of co-scheduling fragmentation, improving overall processor utilization.

d) Briefly co-scheduling (to maintain the skew between processors execution times within reasonable
    limits) is necessary both so that the guest operating system and the applications with them run
    correctly and with good performance. Significant skew between the vCPUs corresponding to a VM can
    result in both severe performance and correctness issues.

    As an example guest operating systems make use of spin locks for synchronization. But if the vCPU
    currently holding a lock is descheduled, then the other VCPUs belonging to the VM will burn cycles
    busy-waiting until the lock is released. Similar performance problems can also show up in
    multi-threaded user applications, which may also perform some form of synchronization. Correctness
    issues associated with significant skew between the vCPUs of a VM can cause Windows BSODs or Linux
    kernel panics.

e) Idle vCPUs, vCPUs on which the guest is executing the idle loop, are detected by ESX and descheduled
    so that they free up a processor that can be productively utilized by some other active vCPU. 
    Descheduled idle vCPU’s are considered as making progress in the skew detection algorithm. As a
    result, for co-scheduling decisions, idle vCPUs do not accumulate skew and are treated as if they were
    running . This optimization ensures that idle guest vCPUs don’t waste physical processor resources,
    which can instead be allocated to other VMs.  For example, an ESX Server with two physical cores may
    be running one vCPU each from two different VMs, if their sibling vCPUs are idling, without incurring
    any co-scheduling overhead.  Similarly, in the fragmentation example above, if  one of the SMP VM’s
     VCPU is idling, then there will be no co-scheduling fragmentation, since its sibling vCPU can be
     scheduled concurrently with the UP VM.

To summarize ESX scheduler supports and enables SMP VMs for both high performance and fairness. ESX
users should leverage this SMP support for improving the performance of their applications by
configuring the appropriate number of vCPUs for a VM as really needed by the application load.

For a broader technical overview on ESX co-scheduling algorithms described above, please also refer to
the “Co-scheduling SMP VMs in VMware ESX Server“ blog.