posted

3 Comments

VSANA question that I’ve been asked about very often has been around the behavior and logic of the witness component In Virtual SAN. Apparently this is somewhat of a cloudy topic. So I wanted to take the opportunity and answer that question and for those looking for more details on the topic ahead of the official white paper where the context of this article is covered in greater depth. So be in the look out for that.

The behavior and logic I’m about to explain here is 100% transparent to the end user and there is nothing to be concerned with regards to the layout of the witness components. This behavior is managed and controlled by the system. This is intended to provide an understanding for the number of witness components you may see and why.

Virtual SAN objects are comprised of components that are distributed across hosts in vSphere cluster that is configured with Virtual SAN. These components are stored in distinctive combinations of disk groups within the Virtual SAN distributed datastore. Components are transparently assigned caching and buffering capacity from flash based devices, with its data “at rest” on the magnetic disks.

Witness components are part of every storage object. The Virtual SAN Witness components contain objects metadata and their purpose is to serve as tiebreakers whenever availability decisions have to be made in the Virtual SAN cluster in order to avoid split-brain behavior and satisfy quorum requirements.

Virtual SAN Witness components are defined and deployed in three different ways:

  • Primary Witness
  • Secondary Witness
  • Tiebreaker Witness

Primary Witnesses: Need at least (2 * FTT) + 1 nodes in a cluster to be able to tolerate FTT number of node / disk failures.  If after placing all the data components, we do not have the required number of nodes in the configuration, primary witnesses are on exclusive nodes until there are (2*FTT)+ 1 nodes in the configuration.

Secondary Witnesses: Secondary witnesses are created to make sure that every node has equal voting power towards quorum. This is important because every node failure should affect the quorum equally. Secondary witnesses are added so that every node gets equal number of component, this includes the nodes that only hold primary witnesses. So the total count of data component + witnesses on each node are equalized in this step.

Tiebreaker witness: If after adding primary and secondary witnesses we end up with even number of total components (data + witnesses) in the configuration then we add one tiebreaker witnesses to make the total component count odd.

Let me incorporate the definition and logic described above into two real world scenarios and also explain why the witness components were placed the way they did:

  • Scenario 1:  VM with a 511 GB VMDK with Failures to Tolerate 1

Note: Virtual SAN object namespace is limited to 255GB per object. Objects greater than 255GB are split evenly across hosts. This explains the behavior illustrated in both examples below with the a RAID 1 set configuration with multiple concatenated RAID 0 sets.

Example 1

There is only 1 witness deployed in this particular scenario, why?

In this particular scenario all of the RAID 0 stripes were placed on different nodes. Take a closer look at the host names.

Now why did that happened in that way and how does that relate to the witness types described above?

When the witness calculation is performed in this scenario, the witness component logic comes into play as listed below: 

  • Primary witnesses: Data components are spread across 4 nodes (which is greater than 2*FTT+1). So we do not need primary witnesses.
  • Secondary witnesses: Since each node participating in the configuration has exactly one component, we do not need any secondary witnesses to equalize votes.
  • Tiebreaker witness: Since the total component count in the configuration is 4, we only need one tiebreaker witness.
  • Scenario 2: VM with a 515 GB VMDK with Failures to Tolerate 1
Scenario 2

In this scenario there were 3 witnesses deployed, why?

In this particular scenario some of the RAID 0 stripes were placed on the same nodes. Take a closer look host names. The configuration of components is layer out in the following configuration:

  • 2 components on node vsan-host-1.pml.local
  • 2 components on node vsan-host-4.pml.local
  • 1 component on node vsan-host-3.pml.local
  • 1 component on  node vsan-host-2.pml.local
When the witness calculation is performed in this scenario, the witness component logic comes into play as listed below:
  • Primary witnesses: Data components are spread across 4 nodes (which is greater than 2*FTT+1). So we do not need primary witnesses.
  • Secondary witnesses: Since two nodes have 2 votes each and 2 nodes have only one vote each, we need to add one vote (witness) on the following nodes:
    • vsan-host-3.pml.local
    • vsan-host-2.pml.local
  • Tiebreaker witness: After adding the two witnesses above, the total component count in the configuration is 8 (6 data + 2 witnesses) we need one tiebreaker witness and that is the third witness.

For the most part, people expect the witness count is depending on the failures to tolerate policies (0 to 3). The witness count is completely dependent on how the components and data get placed and are not really determined by a given policy.

Again, as I said in the very beginning of the article, this behavior is 100% transparent to the end user and there is nothing to be concerned with since the behavior is managed and controlled by the system.
– Enjoy
For future updates, be sure to follow me on Twitter: @PunchingClouds