posted

1 Comment

One of the more frequent questions answered by support shortly upon a customer creating a Stretched Cluster in VMware Cloud on AWS service for the first time is:

What is this extra host in my cluster?

That is the vSAN Witness Host.  To understand why we need the vSAN Witness Host we need to take a step back and revisit how vSAN protects data.  vSAN is a shared-nothing object store.   One of the defining attributes of an object store is that availability is configurable on a per object basis.  An object in vSAN consists of multiple components. The actual number and composition depend on the Failure to Tolerate policy configuration, but for now, remember that for any vSAN object to be available, vSAN must be able to access more than 50% of the components associated with a given object.  To accomplish this goal vSAN usually treats each host as an independent failure domain and places each component of a given object onto a separate vSAN Host.  Thus, ensuring that vSAN can maintain availability in the face of failure.

Stretched Clusters take the same shared nothing core and solve a much more complicated problem.  How do you use such a system when a large number of hosts could be lost simultaneously?  Solving this problem started with the introduction of fault domains.

Fault domains allowed the Sphere Administrator to provide vSAN with knowledge of the interdependent physical hosts, vSAN then uses this information when deciding where to place components.  By not putting more than one component on any single fault domain vSAN can distribute the components of an object across genuinely redundant infrastructure.  Avoiding any unknowable shared dependencies that could otherwise impact data accessibility.

vSAN Stretched Clusters builds on top of fault domains and introduces the concept of site availability.  Site availability adds a copy of the data in addition to the local failure protection.  An old storage greybeard may recognize the following example as the conceptual equivalent of a RAID 61 (RAID-1 mirror between redundant RAID-6 data sets).

When an object is assigned a policy with the Site Disaster Tolerance of Dual Site Mirroring, vSAN commits any writes to that object synchronously to both the preferred and non-preferred fault domains, while reading locally from whichever fault domain the VM resides within.

How the data is stored in each fault domain depends on the failure to tolerate policy configuration.  So why the witness?  Each copy is protected from failure within each fault domain, but there are still only two copies of the data.  With two fault domains, or more appropriately with only two votes, vSAN would not be able to distinguish a genuine failure from a transient network outage.  In the event of an actual fault, each site would be unable to determine if they were the survivor or victim independently. More importantly from a storage perspective with only two fault domains, it would be possible for an application to be simultaneously active in each fault domain.  This scenario known as split-brain would break any applications depending on ACID storage (most of them).  To protect against split-brain, vSAN Stretched Clusters implement the vSAN Witness Host.

The vSAN Witness Host is a stripped-down vSphere Host running vSAN that cannot host virtual machines and is used to exclusively store the cross-site witness components.   These lightweight meta-data components are dynamically added to any object with a Dual Site Mirroring policy and are used as the tie-breaking vote when determining object availability.  Now in the event of a failure, each site can use its connection to the vSAN Witness Host to measure its independent health.  With the “surviving” site restarting any failed workloads and the “failed” site shutting down any running workload.

vSAN (and by extension VMware Cloud on AWS) abstracts the complexity of this process away with VM Storage Policies.  Declare how resilient each object should be, and vSAN will make it so.  It’s about this point in the conversation that someone correctly asks:

What happens if the witness is unavailable?

When the Witness itself is unavailable the two remaining fault domains will use one another to maintain quorum.  If there is an Availability Zone failure while the vSAN Witness Host was inaccessible vSAN wouldn’t have sufficient votes to maintain access and would preemptively shut down the workload to protect against any potential split-brain scenario.  Fortunately, this is mostly mitigated by the VMware Cloud on AWS service.  VMware Cloud on AWS support monitors and manages the vSAN Witness Host.

If any issues are identified with the existing witness, a replacement is provisioned automatically within as little as 5 minutes. For the worst-case scenario where the witness suffers an unplanned terminal error, auto-remediation services will detect and replace the witness, returning the cluster to full health in around 20 minutes.   This speedy recovery is due to the witness objects being meta-data only. Since the witness objects are relatively small, they can be quickly recreated once a healthy witness is provided.  VMC Support ensures that the vSAN Witness Host is replaced rapidly when an issue is found mitigating risk of downtime.

The vSAN Witness Host is a critical part of VMware’s Stretched Clusters offering, but one that has been mostly removed from view by design in VMware Cloud on AWS.  Stretched Clusters for VMware Cloud on AWS enable customers to focus exclusively on requirements. Simply declare via policy how vSAN should protect a VM and or VMDK, and vSAN will make it so. In my opinion, this is one of the primary advantages of the VMware Cloud on AWS Service.  Stretched Clusters for VMC on AWS, in particular, deliver on this promise.

@glnsize