In vSphere with VMware Cloud Foundation (VCF) 9.0, the surface area of ESX that can be live patched has been dramatically expanded. This includes the vmkernel, user-space daemons, NSX components as well as the existing live patch capable virtual machine execution runtime (vmx).
Use live patch today!
The recently released ESX 9.0.0.0100 patch is live patch capable (release notes). You can use live patch to apply this patch to ESX 9.0.0 (24755229) clusters.
For a refresher on the initial release of live patch with VMware vSphere 8 update 3, see https://blogs.vmware.com/cloud-foundation/2024/07/11/vmware-vsphere-live-patch/.
In a nutshell: Live patch allows some ESX patches to be applied in a non-disruptive manner, without evacuating a host of its virtual machines.
This means that we expect potential future patches to be able to use live patch for quick and non-disruptive patching. Important security patches are the primary target for live patch as these are critical for organisations to adopt as fast as possible. Not every patch will be live patch capable.
Reminder: To identify if a patch is live patch capable, the patch release notes will state if a patch is live patch capable. The VCF and vSphere Lifecycle Manager user interfaces will also denote when a patch is live patch capable.
When user-space daemons are being patched, they may require a daemon restart. Depending on the user-space daemon being patched and restarted, the ESX host might experience a brief connection interruption with vCenter. For example, patching the hostd daemon may require a restart of that daemon. This may cause a host to briefly appear disconnected from vCenter; this is expected, and does not impact the virtual machines.
If live patch targets the virtual machine execution runtime (vmx) virtual machines undergo a fast-suspend-resume (FSR) operation during live patch. Not all patches may require VMs to perform FSR operations. In vSphere with VCF 9.0, the FSR operation is performed significantly faster for vGPU enabled virtual machines, allowing for clusters hosting large vGPU enabled VMs to be live patched without disrupting the AI/ML applications.
What is Fast-Suspend-Resume (FSR)?
A virtual machine FSR is a non-disruptive operation and is already used in virtual machine operations when adding or removing virtual hardware devices to powered-on virtual machines. FSR has a lot of similarities to the vMotion process. The biggest difference being that FSR is a local live-migration. Local meaning within the same ESX host. With FSR, the memory pages remain within the same host.
Some virtual machines are not compatible with FSR. VMs configured with vSphere Fault Tolerance, VMs using Direct Path I/O and vSphere Pods cannot use FSR and need to be manually remediated. Manual remediation can either be done by migrating the virtual machine or by power cycling the virtual machine.
VMs participating in Shared-Disk clustering configuration (e.g. Microsoft SQL Server VMs participating in FCI) do not support FSR operations.
FSR is not related to a virtual machine suspend-to-memory or suspend-to-disk operations.
The vSphere Lifecycle Manager compliance scan will report virtual machines that are incompatible with FSR and the reason why. Having incompatible VMs on the host does not block live patch.
vSphere Lifecycle Manager performs prechecks before a live patch remediation task to ensure the hosts have sufficient available resources. If host(s) do not have sufficient resources, the load on the host(s) may need to be reduced before you can proceed with live patch remediation.
The limitations of live patch with vSphere 8 continue to exist in VCF 9.0, including: no support for live patching systems with TPM 2.0 devices enabled; DPUs in use using vSphere Distributed Services Engine; and the use of parallel remediation in conjunction with live patch. For more, see the vSphere documentation, Configuring vSphere Lifecycle Manager for Live Patches.
Discover more from VMware Cloud Foundation (VCF) Blog
Subscribe to get the latest posts sent to your email.