Deploying Extremely Latency-Sensitive Applications in VMware vSphere 5.5

VMware vSphere ensures that virtualization overhead is minimized so that it is not noticeable for a wide range of applications including most business critical applications such as database systems, Web applications, and messaging systems. vSphere also supports well applications with millisecond-level latency constraints, including VoIP services. However, performance demands of latency-sensitive applications with very low latency requirements such as distributed in-memory data management, stock trading, and high-performance computing have long been thought to be incompatible with virtualization.

vSphere 5.5 includes a new feature for setting latency sensitivity in order to support virtual machines with strict latency requirements. This per-VM feature allows virtual machines to exclusively own physical cores, thus avoiding overhead related to CPU scheduling and contention. A recent performance study shows that using this feature combined with pass-through mechanisms such as SR-IOV and DirectPath I/O helps to achieve near-native performance in terms of both response time and jitter.

The paper explains major sources of latency increase due to virtualization in vSphere and presents details of how the latency-sensitivity feature improves performance along with evaluation results of the feature. It also presents some best practices that were concluded from the performance evaluation.

For more information, please read the full paper: Deploying Extremely Latency-Sensitive Applications in VMware vSphere 5.5.



21 comments have been added so far

  1. On page 6 you say,
    Also, we recommend over-provisioning CPU to reduce contention; the number of VCPUs in the host should be less than the number of PCPUs to leave one or more PCPUs for VMkernel threads for I/O processing and system management

    Is that correct? You recommend over-provisioning?

    1. Yes, we are recommending over-provisioning. This is to achieve to two things:

      1. Latency-sensitive feature performs best when a latency-sensitive VM gets exclusive access to PCPUs. Over-provisioning increases the chances of this. However, as explained in the whtie paper, the best way is to give 100% CPU reservation for a given latency-sensitive VM that guarantees exclusive PCPU access to a VM.

      2. Once PCPUs are exclusively owned by latency-sensitive VMs, they cannot be used by VMkernel threads and user-level processes. So it is recommended to leave one or more PCPUs for those.

      1. I think it’s badly worded. Surely you mean that you reccomend over-provisioning the PHYSICAL host with more PCPU, not the Virtual Machine?

  2. Do you have a spam issue on this site; I also am a blogger, and I
    was curious about your situation; many of us have created some nice procedures and we are looking
    to trade solutions with others, be sure to shoot me an e-mail if interested.

  3. Is it possible that I’ve done this incorrectly? When I set the CPU and memory reservations to their maximum setting, I can run a specific database process (one that reads 50000 records) on my VM in 11 seconds. When I change the Latency Sensitivity mode from NORMAL to HIGH, I notice two things:
    1) The CPU Used, in the vSphere Client Summary Screen, changes from 5600 (two full cores at 2.8GHz) to a low number, like 56. It is almost like the reservation that I gave it disappears.
    2) The same database process now takes 19s to process 50000 requests.

    Now, I believe that the increase in time is dealing more with the lack of CPU reservation, but I cannot figure out exactly WHY this is an issue. Overall, my box has 32GB of RAM (24 in use), and I have 12 logical processors, with 16 total vCPU’s in all the guest VM’s in total. Of these, I have created reservations in only ONE VM (the database VM) — all other reservations are 0. Further, most other boxes are pretty idle — vSphere reports that I am using 1.1GHz out of 33.60GHz.

    Has anyone provided any additional testing or clarification regarding this feature?

    1. I’m assuming that you don’t use a pass-through mechanism such as SR-IOV. If that’s the case, the performance degradation you are observing might be because the workload you are running has a high packet rate or the packet size is large (larger than MTU size). The feature disables both VNIC coalescing and LRO that can badly affect the performance in such cases (high packet rate and/or large packet size).

      However, it’s hard to conclusively tell what’s going on with the given information. If you’d like to figure out what’s exactly happening on your system, you can file a support request: https://www.vmware.com/support/file-sr/. Our support team has a systematic way of collecting necessary stats and diagnosing the problem.


  4. Could you explain how memory and CPU reservation actually effect together with latency sensitivity setting? According to the document “Deploying Extremely Latency-Sensitive Applications in VMware vSphere 5.5”, latency sensitivity feature requires memory reservation, but does it mean 100% “Reserve all guest memory” option checked or is lower value enough? When you switch latency sensitivity option to High, vSphere web client gives yellow warning “Check CPU reservation – VM needs enough CPU reservation to power on when the level is set to High”. Above-mentioned VMware document strongly recommends maximizing CPU reservations, but does it need to be 100% (total vCPU of VM * clock frequency of CPU) or would latency sensitivity setting work with a lower setting?

    The document “Deploying Extremely Latency-Sensitive Applications in VMware vSphere 5.5” states that with the latency
    sensitivity feature enabled, the CPU scheduler determines whether exclusive access to PCPUs can be given or not considering various factors including whether PCPUs are over-committed or not. Is there any way to check if the conditions are met or not?

  5. 1. You need to reserve all guest memory to use the latency-sensitivity feature. Otherwise, you won’t be able to power on the VM.

    2. About CPU, you don’t have to reserve 100% CPU to use the feature. However, reserving 100% CPU increases the chance of having exclusive access to PCPUs.

    3. The information about whether exclusive access to PCPUs can be given or not is kept internally in the CPU scheduler. It’s not external to the user.

  6. Hi,

    Does it help to enable reservation on CPU or memory IF the ESX only run 2 VMs which host the same apps/functions with same priority ? While both are just as aggressive and CPU intensive and are large memory VMs. Thanks

    Best Regards,

    1. Hi, I have two questions.
      1. Latency-sensitive feature must be used with a pass-through mechanism such as SR-IOV? If vds is used, Latency-sensitive feature can work?
      2. When Latency-sensitive feature is enabled, and cpu/mem are reserved to enough, some PCPU are exclusived, but some are not? why?
      Sometime, it can work. But now some pcpus are not exclusived.

      there are 2vm, one is given with 8 VCPU, and anothe is given 4 vcpu. The host have two socket with 8 Core. And mem is rerved to enough.
      vcpu vm type name uptime status usedsec syssec wait waitsec idlesec readysec htsharing min max units shares group emin cpu mode affinity
      873110 873110 U vmx-vcpu-0:AE_M 56916.151 WAIT 12435.294 0.000 IDLE 44403.032 44121.812 21.417 any 0 -1 pct 4000 vm.835595 5136 10 2 0-31
      873111 873110 U vmx-vcpu-1:AE_M 56916.141 RUN 11877.831 0.000 NONE 0.755 0.047 1.188 any 0 -1 pct 4000 vm.835595 5136 30 2 30
      873112 873110 U vmx-vcpu-2:AE_M 56916.138 RUN 55873.072 0.000 NONE 14.439 11.025 1.219 any 0 -1 pct 4000 vm.835595 5136 17 2 17
      873113 873110 U vmx-vcpu-3:AE_M 56916.137 RUN 4335.076 0.000 NONE 16.391 12.977 1.117 any 0 -1 pct 4000 vm.835595 5136 18 2 18

      873162 873162 U vmx-vcpu-0:AE_M 56899.632 WAIT 8357.088 0.000 IDLE 48438.294 48280.499 15.737 any 0 -1 pct 8000 vm.835594 13752 6 2 0-31
      873163 873162 U vmx-vcpu-1:AE_M 56899.623 RUN 8081.814 0.000 NONE 0.847 0.049 0.994 any 0 -1 pct 8000 vm.835594 13752 28 2 28
      873164 873162 U vmx-vcpu-2:AE_M 56899.623 RUN 56401.365 0.000 NONE 5.446 0.048 0.915 any 0 -1 pct 8000 vm.835594 13752 23 2 23
      873165 873162 U vmx-vcpu-3:AE_M 56899.622 RUN 1152.279 0.000 NONE 5.399 0.049 0.940 any 0 -1 pct 8000 vm.835594 13752 20 2 20
      873166 873162 U vmx-vcpu-4:AE_M 56899.622 RUN 1167.588 0.000 NONE 5.402 0.049 0.904 any 0 -1 pct 8000 vm.835594 13752 27 2 27
      873167 873162 U vmx-vcpu-5:AE_M 56899.621 RUN 1152.916 0.000 NONE 5.341 0.049 0.917 any 0 -1 pct 8000 vm.835594 13752 24 2 24
      873168 873162 U vmx-vcpu-6:AE_M 56899.621 WAIT 1523.631 0.000 IDLE 54486.893 54427.653 178.460 any 0 -1 pct 8000 vm.835594 13752 4 2 0-31
      873169 873162 U vmx-vcpu-7:AE_M 56899.620 WAIT 1872.768 0.000 IDLE 53306.191 53227.926 508.152 any 0 -1 pct 8000 vm.835594 13752 19 2 0-31

      1. 1. Using a passthrough device is not a must for the latency-sensitivity setting and it can be used with VDS.
        2. You need to have 100% CPU reservation to guarantee exclusive PCPU access. Otherwise, exclusive PCPCU access might not be given based on CPU scheduler’s decision.

    2. Reserving CPU 100% guarantees to get exclusive CPU access which helps to improve performance. Also you need to reserve memory to use the latency-sensitivity setting.

      1. CPU/Mem are guaranteed to 100% and we can get that the vm exclusively access to pcpu, but recently it can not work. The problem is as following:
        1. In host A , VM A is scheduled to access to pcpu exclusively on NUMA1, VM B is scheduled to access to pcpu exclusively on NUMA0
        2. In host B, part of vcpus of VM A is scheduled to NUMA1, and the rest is scheduled to NUMA0 and can’t access to pcpu exclusively.

        The host has two sockets with 8 cores. VM A : 4 cores, VM B: 8Cores.
        Vcpu speed is equal to pcpu speed with the resource model “Pay as you go”.

        I guess that in second case, there are no enough cpu when both VM are scheduled at the same time to one NUMA. Can you help to explain the reason?
        And how to solve the problem? The numa affinity setting is not considered for cloud flexibility.

  7. Dear Mr Jin Heo,

    I’m a software developer working for LAWO in Germany and I’ve written a software for streaming RTP audio at a very low latency and low jitter for exchanging audio with hardware devices, that only provide a jitter buffer of about 20 millisecs. The audio packets are AES67 compliant so one UDP packet contains 1 millisecond of audio or even less. My software is running on windows and although this is no realtime OS, it works quite good on physical machines in a 24/7 appliance.

    Now we have requests for running this application in a virtual environment and we decided to start with VMware as host system. Unfortunately we don’t achieve the desired jitter < 3 msecs at a longer period of several hours. I've read your whitepapers about Extremely Latency-Sensitive Applications and about Voice over IP. Switching the latency sensitivity to high brought some improvement but still there are too many jitter events up to 50 or 100 millisecs. On real hardware we're using Resplendence DPC latencymon to check the suitability of the system but in the virtual machine this tool shows very high "Interrupt to process latency" of 24 millisecs ans above and I don't know, if this is a valid tool for virtual environments.

    We also tried to get some consulting experts for our needs, but this also seems to be hard. Do you have some tips for us, either for setting up such system (do you even think it's possible) or for a consulting company that is experienced in this topic?

    Best regards,

Leave a Reply

Your email address will not be published.