Product Announcements

Fault Tolerance and Isolation Response

Most of us are familiar with how the HA Isolation Response works.  If a host in a cluster loses connectivity with all the other nodes in the cluster it is deemed as isolated.  When this happens the cluster “Isolation Response” dictates how the host will react. 

The options are:
(1)    Leave powered on
(2)    Power off
(3)    Shut down

The “Leave powered on” option is there to protect against a false positive (meaning the host thinks it’s isolated when it’s really not – typically occurs in response to a network problem outside of vSphere, or when there is insufficient network redundancy).  The “Power off” and “Shut down” options will halt the VMs releasing the VMFS disk locks enabling the non-isolated nodes in the cluster to restart the VMs.  The difference being with shut down an attempt is made to do a graceful shutdown from inside the Guest OS, where power off does not try to shutdown the OS and will just power off the VM.

However, something that many of us probably aren’t aware is that the isolation response doesn’t apply to FT protected VMs.  When FT is enabled on a VM it gets excluded from HA actions.  As such if a host that is running an FT primary gets isolated, the FT protected VM will continue to run on the isolated host regardless of the cluster’s isolation response.

This is an important point to remember when running FT in your HA/DRS clusters.  To avoid a situation where a FT primary VM may get stuck on an isolated host it’s important to have adequate network redundancy for both your management and logging networks.

Regards,
-Kyle