Home > Blogs > VMware vSphere Blog


The Performance Cost of SMP – The Reason for Rightsizing

Rightsizing is an important operational process that does affect performance.  VMware recognizes this and recommends the use of vCenter Operations to assist in identifying under/oversized workloads within your infrastructures.  The value of rightsizing helps to ensure maximum performance of your workloads and the efficient use of your underlying hardware.  It is very easy today to add resources to a virtual machine when required so we need to get away from our habit of over provisioning.

Many often wonder if there is overhead when creating a large virtual machine with many vCPUs that may or may not be used.  Is there waste in doing that?  Why not make all my virtual machines excessive and let the vSphere scheduler sort it out?

Well the simple answer is that – yes – if you build an inefficient virtual machine with too many vCPUs that will not be used, there is waste.  If the workload is rightsized though, you will still maintain a high level of efficiency.

Let’s take a look at the data:

 

Here we did a simple lab test using a single threaded CPU intensive process as the fixed workload.  The benchmark was then run using multiple virtual machines with different vCPU configurations.  Each VM was only running a single thread of the CPU intensive process, but additional vCPU’s were assigned to the virtual machine and left idle, simulating an oversizing of the virtual machine.

The resulting data demonstrates that ‘CPU Efficiency’ decreases as the virtual machines were assigned additional idle vCPUs.  This highlights that fact that there is some small amount of waste but that it doesn’t become visible until very large virtual machine configurations are under-utilized.

Next, we repeated the same benchmark but this time ensured that each additional vCPU that was assigned to the virtual machine was also running the CPU intensive process.  This simulated a rightsized virtual machine that was using all vCPU’s.

The resulting data demonstrates that ‘CPU Efficiency’ was constant as the virtual machines were scaled up.  This highlights that fact that there is no measurable waste when virtual machines are in fact using all of their assigned vCPUs.

Special thanks to my teammate Joey Dieckhans for this data.  Please note this experiment was meant to solely demonstrate scheduler efficiency with a purely CPU intensive workload. When you start to consider the additional impact of storage, memory or network it becomes more important to rightsize to maintain efficiency and reduce waste.

The takeaway here is that we should strive to rightsize our workloads and grow them as required.  Tools like vCenter Operations help automate this process by monitoring the environment and providing sizing recommendations in easy to consume reports.  This ensures you can get maximum value from your hardware investment and provide excellent services levels to your customers.

@vmMarkA

This entry was posted in Performance and tagged , , , , on by .
Mark Achtemichuk

About Mark Achtemichuk

Mark Achtemichuk is a Senior Technical Marketing Architect specializing in Performance within the Cloud Infrastructure Marketing group at VMware. Certified as VCDX #50, @vmMarkA has a strong background in datacenter infrastructures, cloud architectures, experience implementing enterprise application environments and a passion for solving problems. He has driven virtualization adoption and project success by methodically bridging business with technology. His current challenge is ensuring that performance is no longer a barrier, perceived or real, to virtualizing an organization's most critical applications, on their journey to the cloud.

10 thoughts on “The Performance Cost of SMP – The Reason for Rightsizing

  1. Frank Brix Pedersen

    I don’t come to the same conclusion as you. From your data it shows that right-sizing vCpu does not really matter. Even though a vCpu is idle and consumes some resources it is not a lot. I agree that you should always try to give a VM the number of vCpu that makes sense.

    Right-sizing with memory is MUCH more important to avoid ballooning, swapping, vmkernel swap and memory compression.

    1. Sebastian Greco

      Hi Frank!
      Keeping the CPU efficency constant is not the same than saying “it doesn’t matter”. This test was made with no other vms in the host (as I deduce from host cpu utilization) so things like CPU scheduling that affects performance are not being taken into account as it is not the objective of this testing. Also, the more resources your VM has, the more overhead memory is needed, so physical memory is also affected by the vCPU design. Other factors like NUMA, limits, reservations, HT configuration, etc are also not being taken into account as the objective of this testing is other, probing CPU efficency in SMP.

      Right sizing memory…well yes. But ballooning, swapping, etc. Are the result of not having enough physical memory on the host not the result of overprovisioning virtual memory to your VMs.

      All these resources play an important role. I wouldn’t dare to say that one of them is more important than the other.

      Regards,
      Sebas

      1. Frank Brix Pedersen

        Hi Sebastian,

        I like your findings and I will use them in the future, but I still don’t think you showed the big advantage of right sizing vCPU. You only got a minimal performance difference.

        You should try to replicate your data in a more “real world” like scenario. Not by running multiple 32vcpu machines only using one vcpu.

        The best and most realistic case you show is the 4vms@4vcpu each but only running on one core.
        You only lose 3% performance compared to a single core VM.

        I agree you have proved that their is a difference, but it is not a lot :-)

        1. Mark AchtemichukMark Achtemichuk Post author

          Hi Frank. Agreed the cost is not high and the point was to only demonstrate there is some level of waste so as to encourage rightsizing operational processes. We’re very proud to demonstrate the margin is very small. It was very CPU specific but I can safely suggest if we were to take into account other dimensions like IO and Memory we would see increasing waste. Data to this affect can be expected in the future.

  2. Steve Anderson

    Interesting article, but… your numbers just don’t add up for me. First you say that you’re benchmarking with a single-threaded CPU intensive workload. On the most “well sized” 1 vCPU VM, you say you have 10.7% VM CPU utilization. How CPU intensive could it be when it’s only consuming 11% of the VM cpu? That’s not intensive at all.

    Next, I take the “Score” to be the amount of work done in some fixed period. In your second table, the score scales up with the number of actively used vCPUs, so I think I’m interpreting the Score correct. So, the CPU efficiency should be, I believe, the amount of work performed (Score) per amount of host resources used (host CPU utilization).

    In the extra vCPUs idle test (table 1), I get an efficiency of the 1 vCPU case as:
    4406.62 / 12.5 = 352.5

    For the 4 vCPU case (1 active, 3 idle), I get an efficiency of:
    4277.14 / 12.8 = 334.15

    And the relative efficiency of the over-sized VM to the right-sized is:
    334.15 / 352.5 = 94.8%

    So… I’m really not sure how you’re claiming 99% efficiency relative to the right-sized single vCPU case. I’d sacrifice 1% of efficiency for future flexibility and dynamic capacity. I’d more strongly consider configuring for less dynamic capacity if I was told over-sizing would cost me 5%.

    Likewise in the second test (all vCPUs utilized), I see totally different relative efficiency.

    1 vCPU = 4316.91 / 12 = 359.74
    4 vCPU = 17039.58 / 43 = 396.27

    4 vCPU relative efficiency = 396.27 / 359.74 = 110.2%

    Which again is much more compelling than the conclusion that you provided. And it makes sense when you think about it. If you have 4 identical CPU bound concurrent workloads, it’s much more efficient (~10%) to do it in 1 VM with 4 vCPUs than in 4 VMs with 1 vCPU each.

    1. Mark AchtemichukMark Achtemichuk Post author

      Hi Steve,

      Appreciate your participation. Let me go into a few more details as I think one of our chart labels and methodology could have been clearer.

      You’ve used the ‘Host CPU Utilization’ column in your work above whereas we used the ‘VM CPU Utilization’ column. A better description for the column we used could have been ‘VM CPU Utilization as a % of Host Total.’ Using esxtop, we captured the exact number of GHz used by the entire ESXi host (aka Host CPU Utilization) to assist in comparing and idle wasteful configuration and a saturated efficient configuration. Simultaneously we captured the exact number of GHz used by the VM (aka VM CPU Utilization). In our efficiency calculations we used the GHz consumed by the VM, thereby removing the fixed ( ~2% of the host) hypervisor overhead that we observed through the experiment.

      Why did we do that?

      Well the experiment was done to investigate the waste at the VM level. By adding hypervisor overhead it would impact the efficiency study because at one end of the scale, say 4x1way VM’s, fixed hypervisor overhead would be a large impact on efficiency, whereas at the other end of the scale, say 4x32way VM’s, impact is minor.

      The math we’ve done was actually:

      (lots of waste)

      4x1way: Score of 4406.62 / VM CPU Utilization of 10.7% = 100%
      4x32way: Score of 4281.85 / VM CPU Utilization of 11.6% = 90%

      The last detail to highlight was that this experiment was run on an Intel 40 core server. So we would expect that the 4x1way VM config, running a CPU intensive workload, would consume 4 out of 40 cores or approximately 10%.

      I hope this helps to explain our methodology and reasoning.

  3. Steve Anderson

    Thank you for the clarification. That does explain how you got your relative efficiencies in the single-threaded case. I still get significantly different numbers in the multi-threaded case.

    Multi-thread 4×1 = 4316.91 / 10 = 431.691
    Multi-thread 4×4 = 17039.58 / 41 = 415.6

    Relative efficiency = 415.6 / 431.691 = 96.2%

    I don’t want to sound petty bickering about 3.8%, but your multi-threaded chart’s straight 100% column implies “no performance impact” which is something I just don’t see when I do the math.

    1. Mark AchtemichukMark Achtemichuk Post author

      Hi Steve,

      To make the chart easier to read I was rounding on the ‘VM CPU Utilization’ column which is masking the 14 digits past the decimal I continued to use in the calculations. If you use the complete number for calculation, as I was in the spreadsheet, the numbers are correct.

      Example (multi-threaded):

      4×1 Score of 4316.91 / VM CPU Utilization of 0.10465595744681 (or approx 10%) = 41248.583504613315

      4×4 Score of 17039.58 / VM CPU Utilization of 0.41234393617021 (or approx 41%) = 41323.706996302940

      41323.706996302940 / 41248.583504613315 = 1.00182 or 100%

Comments are closed.