Technical Adoption Manager (TAM)

VMware vSphere Cluster Services Explained

VMware vSphere® 7.0 Update 1 was released in October 2020. Many VMware customers have already upgraded to vSphere 7 Update 1 but may not be fully aware of its new features, such as VMware vSphere Cluster Services.

As a VMware Technical Account Manager (TAM), I help my customers take full advantage of their VMware software and help them understand features like this one. VMware vSphere Cluster Services provides a mechanism that allows VMware to decouple vSphere DRS and vSphere HA from VMware vCenter Server®.

This feature ensures cluster services, such as vSphere DRS and vSphere HA, are all available to maintain the resources and health of the workloads running in the clusters independent of the vCenter Server instance availability.

The vSphere Cluster Services deploys vSphere Cluster Services virtual machines to each vSphere cluster that is managed by vCenter Server 7.0 Update 1. vSphere Cluster Services VMs are deployed to a cluster at creation and hosts are added to the cluster after.

VMware vSphere® Distributed Resource Scheduler® (vSphere DRS) cannot function if vSphere Cluster Services VMs are not present in the vSphere cluster.

vSphere Cluster Services gets enabled for all customers in 7U1. This feature will be enabled by default to all vSphere customers when they upgrade to 7U1 or when they do fresh deployments.

vSphere Cluster Services architecture

The architecture for vSphere Cluster Services consists of a maximum of three agent virtual machines that are automatically deployed. These are lightweight VMs that form a Cluster Agents Quorum.

For clusters with fewer than three hosts, the number of agent VMs is equal to the number of ESXi hosts.

Cluster Services health

The three agent VMs are self-correcting. This means that when the agent VMs are unavailable, vSphere Cluster Services will try to power-on the VMs automatically.

There are three health states: healthy, degraded and unhealthy. For more info about health states, basic architecture, plus a video demo check out this blog post all about vSphere Cluster Services.

The vSphere cluster shows the following alert message if healthy vSphere Cluster Services VMs are not available in the cluster:

Agent VMs

vSphere Cluster Services VMs are present in each vSphere cluster and have the following characteristics:

  • Managed by vCenter Server
  • Run a minimal installation of Photon OS.
  • Configured with 1 virtual CPU, 128 MB memory and 2 GB thin-provisioned disk.
  • Do not have an assigned network interface card (NIC) or IP address, it uses VMCI interface to communicate with the hypervisor.
  • Deployed to a shared datastore. When a shared datastore is not available, vSphere Cluster Services VMs are deployed to local datastores.
  • Deployed from an OVA with a minimal installed profile of Photon OS. vSphere Cluster Services manage the resources, power state and availability of these VMs.
  • Do not support modifying vSphere Cluster Services VMs

For vSphere clients, vSphere Cluster Services VMs are not visible in the inventory tree of the Hosts and Clusters view. Instead, you’ll see vSphere Cluster Services VMs in the VMs and Templates view as below:

vSphere Cluster Services Components

vSphere Cluster Services introduces vSphere Cluster Services Manager and vSphere Cluster Services Resource Manager. Here’s more about those exciting new features:

ESX Agent Manager (EAM)

  • Deploys vSphere Cluster Services VMs to the ESXi hosts.
  • Receives VM placement information from vSphere Cluster Services Resource Manager

vSphere Cluster Service Resource Manager

  • New module running in the vmware-vpxd service
  • Manages vSphere Cluster Services VM initial placement and failover placement

vSphere Cluster Services Manage

  • New module running in the wcpsvc service
  • Manages and monitors a vSphere ESX Agent Manager agency for each set of cluster VMs

vSphere Cluster Services OV

  • Virtual machine OVF template for vSphere Cluster Services VMs
  • vCenter Server patches and updates replace the OVF template with updated versions, if needed

vSphere Cluster Services during infrastructure maintenance

When a host is placed into Maintenance Mode, the vSphere Cluster Services agent VMs are migrated to other hosts within the cluster like regular VMs.

  • When there is only one host. vSphere Cluster Services VMs will automatically power off when the single host cluster is put into Maintenance Mode, thus maintenance workflow is not blocked.
  • When there are 2 or more hosts. In vSphere Cluster where there is more than one host, and the host being considered for maintenance has running vSphere Cluster Services VMs, then vSphere Cluster Service VMs will be migrated to other hosts if there are free resources and if they have storage connectivity (shared storage).
  • Decommissioning a cluster. You have to put all the hosts into Maintenance Mode prior to deleting the cluster for proper cleanup of vSphere Cluster Services VMs. If you delete the cluster without placing the hosts in Maintenance Mode, there will be stale vSphere Cluster Service VMs running inside the hosts. These stale VMs then cause issues when the hosts with running VMs are re-added to a new cluster.
  • Disconnect host. When a host is disconnected, vSphere Cluster Services VMs are not cleaned from these hosts as they are disconnected and are not reachable. New vSphere Cluster Services VMs will not be created in the cluster’s other hosts as it is not clear for how long the host will be disconnected. When a disconnected host is reconnected, the vSphere Cluster Services VM in this disconnected host will be registered again to the vCenter inventory. If a disconnected host is removed from the inventory, then new vSphere Cluster Services VMs may be created in other cluster hosts if Quorum is not reached.

Retreat Mode

You can disable vSphere Cluster Services on a cluster using Retreat Mode; however, this means that vSphere DRS will not function. Retreat Mode should be used with caution and should be used only for the purposes identified below:

  • Proper cleanup before deleting the cluster. Since you can’t power off the three agents VMs — as vCenter Server detects that these VMs are powered off — it will automatically power them back on
  • vSAN disk group removal / vSAN cluster recreation if vSphere Cluster Services VMs are placed in the vSAN datastore (details are mentioned in this KB article)
  • You are not using vSphere DRS and want to disable vSphere Cluster Services

The impacted cluster services due to the enablement of Retreat Mode on a cluster:

  1. vSphere DRS will not function on that cluster if DRS is enabled for that cluster. That means the workloads running inside that cluster are not load-balanced.
  2. vSphere HA will not perform optimal placement during a host failure scenario as HA depends on DRS for placement recommendations. HA will still power-on the VMs, but these VMs might be powered on in a less optimal host.

Disable vSphere Cluster Services using Retreat Mode

  1. Log in to the vSphere client and select the cluster where you want to disable vSphere Cluster Services.
  2. Copy the cluster domain ID domain-c<number> from the URL of the browser. In this case, it is domain-c1006.
  1. Select your vCenter Server and navigate to the vCenter Server Configure tab. Under Advanced Settings, click the Edit Settings button.
  1. Use the domain-c<number> copied in step 2 and add a new entry config.vcls.clusters.domain-cxxx.enabled (xxx signifies the number that you got in step 2)
  2. Set the Value to False then Save

Then you’ll see that all the vSphere Cluster Services VMs in the cluster have disappeared and the Cluster Services health will be set to Degraded.

If the cluster has DRS enabled, it will stop working. You’ll see the below warning in the Cluster Summary.

Additional Resources

Leverage these additional resources to learn more about vSphere Cluster Services. Also, ask your VMware Account Executive how Technical Account Management Services can help your organization.