HPC

Highlights of New Features and Improvements in vSphere 7/vSAN 7/NSX-T 3 for High Performance Computing and Machine Learning

vSphere 7.0 is the biggest release of vSphere in over a decade, with a completely new design to power modernization of both infrastructure and applications. The headline news is that vSphere now support Kubernetes natively, so that you can run VMs and containers on the same platform. In addition, the new release of vSphere 7.0 together with vSAN 7.0 and NSX-T 3.0 introduces numerous new features and performance improvements for applications ranging from business-critical workloads, to AI/ML, to IoT and NFV. In this blog, we highlight the features and improvements that are closely related to High-Performance Computing (HPC) and Machine Learning/Deep Learning (ML/DL) workloads.

What’s new in vSphere 7

vSphere with Kubernetes

In the data science community, containers are the dominant mechanism to develop, package, and deploy models. The highlight of vSphere 7 is the native support of Kubernetes, which is the most popular platform for orchestrating containers and for deploying AI/ML pipelines. vSphere with Kubernetes bridges the two worlds of developers and admins by implementing a new vSphere construct called Namespace. A Namespace defines a logical set of resources, permissions, and policies that enable application-centric management at scale for system admins. At the same time, AI/ML developers and users now gain access to the infrastructure through familiar Kubernetes APIs. To learn more details about vSphere with Kubernetes, read this blog and this blog.

 

 

 

Assignable hardware

Nowadays HPC and AI/ML workloads increasingly rely on hardware accelerators, such as GPUs, InfiniBand interconnect HCAs, and FPGAs, to accelerate computing and data processing. Assignable Hardware (AH) is a new feature in vSphere 7 that adds flexibility to the support of hardware accelerators. By identifying each device with a set of attributes rather than its physical PCI address, AH enables initial VM placement with vSphere DRS and also High Availability (HA) for VMs equipped with DirectPath I/O (also called Passthrough) PCIe devices or NVIDIA vGPUs. In vSphere 7, customers have three options to use hardware accelerators: the legacy DirectPath I/O, Dynamic DirectPath I/O, and NVIDIA vGPU.

 

A related feature to AH is Hardware Labels. Users can optionally assign a customizable label to each device when using Dynamic DirectPath I/O, so that AH can match a VM to a device using the provided label instead of attributes. This adds extra flexibility to device mapping.

Note that AH requires VM Hardware version 17. For demos and examples, read this blog.

VMware Bitfusion in vSphere

VMware Bitfusion is now available in beta version. Hardware accelerators like GPUs are expensive and scarce resources on many of today’s HPC and AI/ML platforms. That implies two restrictions: 1) only a small portion of users can access these devices; 2) HPC and AI/ML workloads need to run on a specific set of hosts to utilize these devices. VMware Bitfusion provides a paradigm shift by allowing workloads to run anywhere while redirecting the accelerated portion of code to be run on a remote accelerator-attached host. Currently it is focused on Deep Learning training rather than on more general accelerator use cases. With VMware Bitfusion, admins can create a resource pool of accelerators that is able to power a broader collection of AI/ML users in an efficient and flexible way.

Improved DRS

Instead of focusing on cluster-level balancing, the new DRS algorithm in vSphere 7 now takes a workload-centric approach to decide placements and keep VMs “happy”. The new DRS quantifies VM happiness by using a VM DRS score, which considers various resource contention metrics, such as CPU Ready Time and memory swap, and also resource headroom for application bursting. For each VM, DRS calculates a VM DRS score on every host and uses vMotion to maximize the overall score for all VMs. Rather than using the previous five-minute interval, DRS now runs every minute to achieve finer granularity and better responsiveness. We expect the improved DRS to achieve better resource allocation to HPC and AI/ML workloads running on vSphere cluster.

vMotion enhancements

The updated vMotion embraces several performance improvements, and as a result, vMotion now supports live migration of VMs that run large databases and mission critical workloads (also called monster VMs) as well as large VMs used for HPC and ML workloads. First, page tracers are changed from running on all vCPUs inside a VM to a dedicated vCPU to mitigate application performance interference. Second, update of page table entries becomes more efficient thanks to the Huge Pages support in vSphere. Last but not least, the size of bitmap that records modified memory pages has been significantly reduced to minimize switch-over time. The benefits of all these three enhancements can be seen in the figure below. Read this blog for more details.

Support of Precision Time Protocol (PTP)

Precise timekeeping is a critical requirement for certain HPC applications. At a high level, precise and high-resolution timestamps are necessary to correctly construct event sequences. For example, financial applications need to timestamp transactions with sufficient granularity and accuracy to be able to determine the order of trading transactions from multiple clients. To deliver predictable quality of service to time-critical applications, ESXi 7 introduces support for Precision Time Protocol (PTP). PTP plays the same role as Network Time Protocol (NTP), but has its distinctive features that make it more suitable for achieving higher accuracies at sub-microsecond levels.

To use this new feature, first enable the PTP daemon on an ESXi 7 host by taking the following steps in vSphere UI: Host -> Configure -> System -> Services -> PTP Daemon: Enable. Then you can add a new VM virtual device called PrecisionClock to your target VM. This ensures that your VM is synchronized to the VMkernel system time backed by PTP on the host. Note that your VM needs to use the new VM Hardware version 17 for PTP.

Simplified lifecycle management

The complexity of managing an HPC or AI/ML cluster increases with the scale of the cluster. vSphere 7 introduced the next generation of vSphere Lifecycle Manager (vLCM) that allows you to manage the lifecycle of your cluster using a declarative approach. Admins can now use vCenter Server Profiles to standardize on a configuration for all of their vCenter servers and use Cluster Image Management to create images that dictate how hosts within the cluster will be configured or updated to. Furthermore, vCenter Server Update Planner enables automatic update notification, and monitors VMware product interoperability so that you will always know whether your update will be compatible with other existing VMware products, such as NSX, even before you start to update. Read this blog for more details.

Intrinsic security and control

vSphere Trust Authority (vTA) enhances application security with remote attestation for sensitive workloads. vTA creates a hardware root of trust using a small, separately managed cluster of ESXi hosts. This attestation mechanism ensures that all of your hosts are running authentic software and remain in a valid configuration.

In addition, other security enhancements in vSphere 7 include vCenter Server authentication with external Identity Federation, simplified certificate management, and support for Intel Software Guard Extensions. Refer to this blog for more details.

What’s new in vSAN 7

Integrated File Services

vSAN 7 unifies storage operations by now providing both block and file services. In vSAN 7, integrated file services make it easier to provision and share files. Users can now provision a file share from their vSAN cluster, which can be accessed via NFS v4.1 and NFS v3. When using this to replace enterprise NFS servers, which are common in HPC clusters, the benefits include simplified management and reduced cost.

Enhanced Cloud-Native Storage

vSAN supports file-based persistent volumes for Kubernetes on vSAN datastores. Developers can dynamically create file shares for their applications and have multiple pods share the data. Read this blog for more details.

Simplified Lifecycle Management

vLCM delivers a unified lifecycle workflow for the full HCI server stack: vSphere, vSAN, drivers and OEM server firmware. Furthermore, vLCM constantly monitors cluster configuration and automatically remediates compliance drift.

What’s new in NSX-T 3

Enhanced data path

Efforts are being devoted to continuously improve data path performance in NSX-T. This new release introduces zero TX copy support in Enhanced Network Stack (ENS), FPO Flow Director offloading capability to NIC, and performance improvements with respect to cache utilization and packet sizes on ENS. Altogether, these new features and optimizations will further improve networking performance of latency-sensitive workloads.

AMD EPYC support

NSX-T 3.0 now supports edge nodes, VM and bare metal on AMD EPYC series CPU:

  • AMD EPYC 7xx1 Series (Naples)
  • AMD EPYC 3000 Embedded Family and newer
  • AMD EPYC 7xx2 Series (Rome)

References

  1. vSphere 7 blogs central repository
  2. What’s new with vSAN 7
  3. What’s new in VMware Cloud Foundation 4
  4. vSphere 7 Release Notes
  5. NSX-T Data Center 3.0 Release Notes

 

Author: Michael Cui, OCTO