Performance Latency-sensitive workloads Virtualization vMotion vSphere

What’s New in Performance for VMware vSphere 7.x?

Underlying each release of VMware vSphere are many performance and scalability improvements. The vSphere 7.x platform continues to provide industry-leading performance and features to ensure the successful virtualization and management of your entire software-defined datacenter.

The latest What’s New in Performance technical paper covers VMware vSphere 7.0, U1, U2, and U3. Some highlights include:

  • Introduced in vSphere 7.0
    • Selective latency sensitivity: This new feature allows the VM to pin a subset of its vCPUs to individual cores for greater performance. Note that this is an enhancement to the existing VM latency-sensitivity setting, which pins all vCPUs in the VM to cores. Pinning a specified vCPU subset reduces the resource requirements while maintaining needed performance. We can now demonstrate workloads running on pinned vCPUs that show similar or better performance whether the latency-sensitive setting is used for individual CPUs or when it’s used for the whole VM.
    • vMotion enhancements: Memory pre-copy optimizations, loose page trace installs, improved page table granularity, and switch-over phase enhancements. These features are designed to support the vMotion of “monster” VMs (those with a large number of vCPUs) but benefit all sizes of vMotions as well.
    • Precision clock device: Maintains accurate timekeeping across the cluster for time-sensitive workloads.
  • Introduced in vSphere 7.0 U1
    • Paravirtual RDMA and native endpoints: Enhances the performance for applications and clusters that use RDMA to communicate with storage devices and arrays.
    • Monster VM enhancements: Features include widening of the physical address, address space optimizations, better NUMA awareness for guest operating systems, and more scalable synchronization techniques. This allows VMs to now be sized to 768 vCPUs and 24TB RAM. ESXi hosts with AMD processors can support VMs with twice the previous number of vCPUs (256), and up to 8TB of RAM.
    • vMotion enhancements: Dramatic improvements in performance when migrating VMs. These include rearchitecting memory pre-copy with an innovative page-tracing mechanism, among other things, that greatly reduce the performance impact on guest workloads during live migration and significantly reduce vMotion duration. Performance data from Tier-1 workloads show these optimizations can scale to hundreds of vCPUs and terabytes of memory.
    • NVMe over Fabrics (NVMe-oF): A protocol specification that connects hosts to high-speed flash storage via network fabrics using the NVMe protocol. The fabrics that vSphere 7.0 U1 supports include Fibre Channel (FC-NVMe) and RDMA (RoCE v2). The benchmark results show that FC-NVMe consistently outperforms SCSI FCP in vSphere virtualized environments, providing higher throughput and lower latency.
  • Introduced in vSphere 7.0 U2
    • Enterprise NVIDIA infrastructure support: The NVIDIA Ampere architecture enables you to perform high-end AI/ML training and ML inference workloads by using the accelerated capacity of the A100 GPU. vSphere support for A100 GPUs delivers world-class AI performance: up to 20X the performance of previous generation GPUs. As well, you get near bare metal performance and technologies such as GPU Direct communications that enable higher performance for scale-out workloads (adding more VMs to the vSphere system).
    • Performance improvements for AMD Zen CPUs: vSphere 7.0 U2 includes a CPU scheduler that is architecturally optimized for AMD EPYC. This scheduler is designed to take advantage of the multiple last-level caches (LLCs) per CPU socket offered by the AMD EPYC processors. An extensive performance evaluation using both enterprise benchmarks and microbenchmarks shows that the CPU scheduler in vSphere 7.0 U2 achieves up to 50% better performance on these processors than vSphere 7.0 U1. AMD Zen CPU optimizations allow a higher number of VMs or container deployments with better performance.
    • Latency-sensitive workload optimizations: Latency-sensitive workloads, such as those in financial and telecom applications, can see a significant performance benefit from I/O latency and jitter optimizations in ESXi 7.0 U2.
  • Introduced in vSphere 7.0 U3
    • More latency-sensitive workload optimizations: ESXi 7.0 U3 has been further optimized to allow ultra-low-latency applications to perform better with reduced jitter and interference, specifically edge-based, real-time applications.
    • vSphere Memory Monitoring and Remediation (vMMR): The size of DRAM contributes roughly to 50-60% of the server cost. And it’s not linear—a 1TB DRAM contributes roughly to 75% of the server cost. So, there’s a huge need to reduce the DRAM cost. One solution is Intel Optane Persistent Memory Mode, in which the hardware hides the DRAM as cache and exposes PMem as the memory of the system.
    • Support for NVMe over TCP/IP: Allows ubiquitous TCP/IP networking infrastructure to be used for storage traffic that is better optimized for flash and SSD. With this advancement, organizations can achieve higher performance and lower latency at a reduced cost.

For more new features and enhancements, read What’s New in Performance for vSphere 7.