Home > Blogs > VMware vSphere Blog


HA split brain, which VM prevails?

By Duncan Epping, Principal Architect.

I received a question on twitter last week around HA split brain scenarios. Let me give an example first of when a split brain scenario could occur:

  • Isolation response = leave powered on
  • iSCSI / NFS storage

When the above two requirements are met and a host in your cluster is fully network isolated HA will be able to restart the virtual machine as it will appear to HA as the host has completely failed. There reason for this is because:

  1. There will be no network heartbeats coming from this host
  2. There will be no datastore heartbeats 
  3. File locks on virtual machines are released by the "isolated host" as it cannot access storage any longer
  4. The management address of this host cannot be pinged
  5. The "isolated host" cannot write to the datastore to inform the master it is isolated

On top of that the host which is isolated will also take no response as "leave powered on" was selected. In other words, the virtual machines running on the isolated host will just remain up and running. As to the master it seems that the host has failed it will initiate the restart of the impacted VMs. Because the full host has isolated, including storage network, the VMs can be powered on as the "file lock" that the isolated host had times out.

Now as soon as the isolated host return you will have two instances of the same VM on the network. However only one of these has disk access. The one which doesn't have disk access will automatically be killed by the host it is running on. This was introduced in vSphere 4 U2 and still applies today.

Of course this whole situation could be prevented, you could just change the "isolation response" to "power off" and this is what we recommend!

This entry was posted in Uptime by Duncan Epping. Bookmark the permalink.
Duncan Epping

About Duncan Epping

Duncan Epping is Principal Architect at VMware (R&D, Integration Engineering) and is focused on vCloud / vSphere architecture and integration. He was among the first VMware certified design experts (VCDX 007). He is the co-author of several books, including best seller vSphere 5.1 Clustering Technical Deepdive. He is the owner and main author of the leading virtualization blog yellow-bricks.com.

12 thoughts on “HA split brain, which VM prevails?

  1. Good post Duncan! I’ve found that customers are sometimes confused by the isolation response settings – I’ve know people to believe that “power off” means that the VM would not failover. In other words they think that the isolation response is the end result of what would happen to a VM, rather than it being an action that would be carried out at the time of host isolation.

  2. It seems like “Leave Powered On” is fine because the split-brain problem is already being handled well.
    When exactly does the recovered ESXi host realize its VM is a phantom because another host is running the VM? It sees the lock the other host has on the VM’s files, right? Does it wait until the VM has cause to write to the .vmdk? Or does it proactively check the lock on the .vmx when the host regains network connectivity, regardless of what the guest OS inside the VM is doing?
    I believe it realizes the phantomness very quickly and therefore shuts down the phantom VM very promptly after the ESXi host recovers network connectivity. So there’s very little or no time where the VM and phantom VM are conflicting on the network, so where’s the harm? In that case, I don’t see much value in setting isolation response to power off or shutdown. It seems like the split-brain scenario is being handled well already.
    I generally favor “Leave Powered On” as the isolation response. When it’s hard to choose between options–e.g. because they each address very unlikely scenarios and you don’t know which is more unlikely–I favor the option where I don’t have to explain to non-technical managers why our high availability software CAUSED downtime! :-)

  3. No. “Leave Powered On” is fine with NFS and iSCSI. Split brain is no longer a problem (unless you’re running vSphere older than 4 U2).

  4. On further thought, I’m not sure what Host Isolation Response makes the most sense with vSphere 5 and IP-based storage (NFS or iSCSI).
    “Leave Powered On” risks split brain scenario for a brief period, though risks of that are mitigated by the vSphere 4 U2 and later feature that shuts down the phantom VM when it finds it no longer has .vmdk locks. I would think in most cases this would be little or no problem, but maybe I’m wrong.
    Here’s the scenario that bothers me. What happens if all hosts in a cluster lose all network and all datastores? (If an admin goofs and causes a network configuration problem, it’s fairly likely that all hosts in the cluster will be affected.) And what happens if all hosts come back online after HA Isolation Responses have been executed? If the VMs were “Leave Powered On”ed, will the VMs likely just come back online, or did they likely crash due to not having access to their disks? If the VMs were “Power Off”ed, will HA begin to restart them as soon as some hosts come back online? Which produces the worse results here? Does it matter how long the all-hosts-no-network-no-datastores outage lasts?

  5. I precise :
    I understand that the VMX process will remain on the isolated host, but VMs should be in an anormal state because they have no access to the storage (i.e. BSOD ?)

  6. Test it Vincent. I also expected when I tested it that the VM would BSOD. But it just keeps on retrying the IO and just sits their doing nothing unfortunately.

  7. Ok there’s no BSOD, but the VM is not usable ? You can’t do anyhting on it ? (There’s no VMDK to write on it). Does the VM ping ?
    If the VM can access his network what would happen is a customer try to RDP on it ?

  8. Is it pingable? Yes
    Can you RDP? Yes
    Can it write to disk? No
    That is my experience, but experience my vary based on the OS / install / apps etc. I would suggest testing it :-)

  9. Thanks for your answers.
    But I still couldn’t understand how the VM could work without access to his VMDK.
    For me in this case, ok you can RDP, ping, but nothing should work.

  10. It will work because a good portion of the OS is loaded into memory. The OS does regularly attempt to write to disk, but usually will not immediately fail when the disk is not reachable; after too long a length of time it may (and probably will) BSOD.
    As an example, I recall one person with a laptop whose entire C:\Windows directory appeared to have become corrupted while the machine was on; opening that directory had no contents, and you could not run any utilities inside of that folder. However the machine remained up and useable because the critical OS files were loaded into memory, so unless you try to do updates things tended to work, though sometimes buggily.
    I recall from the VCP test that Windows server may have issues after a 60 second timeout from actually trying to read the disk, but that response time can be controlled by a registry entry. Linux, in my experience, can have some issues of becoming sluggish if it has issues accessing a failing or non-responsive disk (as it continually retries the IO request).

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>