VCF Compute (vSphere) Technical

Embedded vSphere Cluster Services Deep Dive

About Embedded vSphere Cluster Services

vSphere Cluster Services (vCLS) has existed since vSphere 7.0 Update 1. This feature ensures cluster services such as vSphere DRS and vSphere HA are all available to maintain the resources and health of the workloads running in the clusters.

Going forward, this original version of vCLS will be referred to as External vCLS. This version was deployed by vCenter as an ordinary VM using an OVF image. This step was prone to slow down or interruption due to connectivity and bandwidth between vCenter and ESX.

In vSphere 8.0 Update 3, VMware has released a newer revision of the feature known as Embedded vCLS, intended to simplify deployment and management of vCLS VMs. This version instead packages vCLS as a container image with ESX. Creating and destroying the VMs only requires sending configuration updates, rather than uploading images to create VMs and deleting them to destroy VMs. This difference leads to simpler and quicker management operations, but comes with some behavioural differences.

Embedded vCLS is introduced in vSphere 8.0 Update 3, and deployed when a supported vCenter has any supported hosts in a cluster. In a freshly deployed environment where VC and ESX are running version 8.0 Update 3, Embedded vCLS will be deployed instead of External vCLS as soon as the cluster is formed and the hosts are taken out of Maintenance Mode. In an environment that has been upgraded from a previous version of vCenter, External vCLS will continue to be used in clusters made up entirely of ESX hosts running version 8.0 Update 2 or older. Clusters of hosts that are upgraded to ESX 8.0 Update 3 will be automatically switched to Embedded vCLS. For more details about rolling-upgrade scenarios, see the section on Mixing Supported and Unsupported ESX Versions.

Similarities and Differences Between vCLS Versions

External and Embedded vCLS VMs will both show up in the inventory under the “vCLS” folder and the cluster’s root resource pool, and both use containerized permissions. Both types are hidden from the vSphere UI sidebar in the Hosts and Clusters view. DRS can function if either type is present in the cluster. Both types should generally be ignored by mass-operations on VMs, since they are system-managed. Both types will be destroyed if the cluster is put into Retreat Mode.

Embedded vCLS VMs use a container runtime, resulting in some behavioral differences. These VMs use a static image and a ramdisk for storage, meaning that they do not interact with any inventory datastores. As a result, an Embedded vCLS VM’s datastore list is empty, and its configuration directory refers to a local path on ESX instead of a datastore-accessible path. This means that vCLS datastore selection no longer applies to Embedded vCLS. However, since all VMs have historically had a non-empty datastore list, some assumptions may have been made in existing scripting that are no longer true. Scripts and 3rd party solutions should ensure that they gracefully handle the list being empty as well as a VM’s files not pointing to a browsable datastore.

Embedded vCLS VMs cannot be vMotioned or reconfigured, as their state and placement is managed completely automatically. To influence vCLS VM placement, an administrator can set Anti-Affinity rules so that vCLS VMs are kept away from certain hosts or VMs when possible. In general, mass VM operations should ignore Embedded vCLS VMs. They are automatically placed according to host and cluster operations, and should not be explicitly acted on.

Embedded vCLS VMs have some differences in identification from External vCLS VMs, including their ManagedByInfo, which diverges from previous documentation. Other identification traits remain unchanged for compatibility.

  • Extra Config: Both External and Embedded vCLS VMs have a value in vm.config.extraConfig which is not present for workload VMs: “HDCS.agent”, which is set to the string “true”. Embedded vCLS keeps this same value, and adds an additional field, “vCLSCRX.agent”, also set to “true”. Scripts can use the presence of “HDCS.agent” to determine if a VM is vCLS in general, and the presence of “vCLSCRX.agent” to determine whether it is Embedded or External. This is the recommended way to identify vCLS VMs, as it is the most reliably available method.
  • ManagedByInfo: Both External and Embedded vCLS VMs have their vm.config.managedBy populated with a valid ManagedByInfo, but its values have been changed to reflect changes in the underlying deployment mechanism. Instead of extensionKey being set to “com.vmware.vim.eam” and type being set to “cluster-agent”, the value for extensionKey is  “VirtualCenter” and the value for type is “vcls-entity”. Scripts that detect vCLS VMs based on the “com.vmware.vim.eam”/”cluster-agent” value will need to either include the “VirtualCenter”/”vcls-entity” value in the search or switch to using vm.config.extraConfig to detect vCLS VMs.
  • Config Path: External vCLS VMs followed the usual expectation of being datastore-relative since they were deployed as typical VMs. In contrast, an Embedded vCLS VM is a container with no datastore, so its VMX path and similar configured paths are pointed at a local ramdisk directory on its host. These paths are not accessible via a datastore browser. This value is not recommended to use for detecting Embedded vCLS VMs, as there may be other system VMs using this style of path in the future.
  • Guest OS: External vCLS VMs used a generic Linux guest. Embedded vCLS VMs use the container runtime “crxSys1Guest” which will show up as “VMware Photon CRX” in the vSphere UI.
  • VM Name: The VM name prefix has not been changed between External and Embedded vCLS. Both will start with “vCLS-“. This is not recommended to use for detecting Embedded vCLS VMs, as anyone can rename a VM to match this pattern. But if this is already used to detect vCLS VMs, it will continue detecting both External and Embedded vCLS.
  • Virtual Property: Both External and Embedded vCLS VMs have queries on their Virtual Property “isClusterVM” return true. Note that unlike vm.config.extraConfig which uses the string “true”, this property returns an actual boolean.

Scripts that search for vCLS VMs to perform an action, that exclude vCLS VMs using the ManagedByInfo, or that perform mass operations on both vCLS and workload VMs, may need to be changed to work with these differences.

Size of the Embedded vCLS VMs

Embedded vSphere Cluster Service VMs are very small compared to workload VMs. Each consumes 1 vCPU, 160 MB of memory, and no storage. Below table shows the specification of these VMs:

MeasureExternal vCLS ValueEmbedded vCLS Value
Preferred VM Count32
Memory128 MB160 MB
Memory Reservation100 MB160 MB
Swap Size256 MB0 MB
CPU11
CPU Reservation100 MHz0 MHz
Hard Disk2 GBN/A
Ethernet Adapter0 (It is a No NIC VM)0 (It is a No NIC VM)
VMDK Size-245 MBN/A
Storage Space-480 MBN/A
VIB SizeN/A1.3 MB

Embedded vCLS During Maintenance Mode

Embedded vCLS VM placement will automatically respond to putting supported hosts into Maintenance Mode (or Standby Mode).

Putting a host into Maintenance Mode will cause vCenter to destroy any Embedded vCLS VM that may be running on it. At the same time, vCenter will attempt to maintain the desired number of available vCLS VMs, and will try to start an Embedded vCLS VM on a different host if possible. However, letting the EnterMaintenenceMode operation occur takes priority, so that putting the entire cluster or the last available host into Maintenance Mode is still possible. The EnterMaintenenceMode operation will wait until the vCLS VM on the target host is removed or powered-off before completing. If the EnterMaintenenceMode operation fails, the host will be considered available, and will again be considered for running vCLS VMs.

Cluster maintenance mode recommendations are aware of the Embedded vCLS deployment state, and will not provide recommendations that would make it impossible to have at least 1 vCLS VM running. In other words, no recommendation will make all vCLS-capable hosts unavailable simultaneously.

VMs are similarly destroyed when entering other “Unavailable” states, such as being Disconnected, as well as when a host is moved out of a cluster or removed from the inventory, but these operations are best-effort and do not wait for the VM to be destroyed before proceeding. When a host enters Not Responding state, the VM is treated as if it is unavailable, and vCLS VMs may be started on other hosts to satisfy the desired number of available vCLS VMs.

When a supported host becomes “Available” by being in the Connected state and not in the middle of certain configuration operations, it becomes a candidate for running vCLS VMs, and may have VMs deployed on it if needed to satisfy the desired number of available vCLS VMs.

By actively maintaining the desired number of available vCLS VMs based on host availability, vCenter aims to avoid or minimize any periods of vCLS unavailability.

Other VM Behaviours

Embedded vCLS VMs are completely non-interactive. They do not support Console access (Web or VMRC), and do not have any credentials that would let an administrator enter the guest OS for monitoring.

These VMs are destroyed and recreated whenever they are powered off or placed on different hosts. This means that the set of Managed Object IDs representing a cluster’s vCLS VMs will change whenever such actions occur. This already occurs in some cases for External vCLS VMs, such as during Retreat Mode, but will likely be more frequent for Embedded vCLS VMs. Scripts should not assume that the list will remain static.

These VMs are ignored by vSphere HA, with no way to make them protected. They will report as non-applicable, even when powered on in a cluster with HA enabled.

Mixing Compatible and Non-Compatible Hosts

vCenter will deploy Embedded vCLS on clusters of compatible hosts (running at least ESX 8.0 update 3). For clusters of non-compatible hosts (older than ESX 8.0 update 3), it will instead deploy the original “External” version of vCLS.

However, the vCenter inventory always allows multiple versions of ESX to coexist, and such states are inevitable during a rolling ESX upgrade. For mixed-version clusters of compatible and non-compatible hosts, vCenter will use Embedded vCLS whenever any available hosts in a cluster support it. 

This naturally creates a point at which a cluster is “upgraded” from External vCLS to Embedded vCLS when hosts start exiting Maintenance Mode after upgrading to a compatible version. vCenter makes this upgrade seamless by waiting until the first Embedded vCLS VM becomes available before it deactivates External vCLS and destroys those VMs. 

Additionally, if all compatible hosts become unavailable and leave only non-compatible hosts, vCenter can perform a “downgrade” to External vCLS. However, since this step is based on a host becoming unavailable, it cannot provide the seamless assurances, and may lead to a period of DRS unavailability between Embedded vCLS teardown and External vCLS deployment.

In Case a vCenter Downgrade is Needed

In case vCenter Server needs to be downgraded to an older version where Embedded vCLS is not available, any Embedded vCLS VMs already deployed will be mis-recognized as either workload VMs or External vCLS VMs. The downgraded vCenter will be unable to delete these VMs using normal inventory operations. Their presence may also lead to instability when trying to generate DRS recommendations.

If a downgrade is needed but has not been performed yet, it is recommended to destroy all vCLS VMs using Retreat Mode beforehand. If the downgrade was performed already or Retreat Mode did not succeed in destroying the VMs, please follow the Knowledge-Base instructions on how to manually destroy Embedded vCLS on an ESX host.

Other VMware Product Interoperability

  • vSAN
    Since Embedded vCLS VMs do not require any storage backing, this feature does not interact with vSAN.
  • Aria Automation and Aria Operations Manager
    vCLS should not impact any partner workflows like Backup, monitoring etc., Since these VMs are managed by vCLS, there is no reason to configure these VMs for backup, capacity reclaim, or other workload operations. These VMs can be identified according to Embedded vCLS has Changed Identifiers Including ManagedByInfo.
  • Products/solutions without any interop issues
    • VMware Cloud Foundation – Cloud Builder and SDDC Manager will not have any impact
    • VMware NSX 
    • VMware vCloud Director
    • VMware Live Site Recovery

Third Party Impact

Since these VMs are managed by vCLS, there is no reason to configure backup on these VMs as restoring from backup in case of a recovery operation is not necessary or might fail. Other forms of monitoring and recovery are also unnecessary and may lead to unexpected outcomes if Embedded vCLS VMs are treated like workload VMs or External vCLS VMs. See the above section on visible differences for more information.