vSphere HA VM Monitoring - Back to Basics

In my experience as a customer, partner and working for VMware, I’ve found HA VM monitoring to be an incredibly helpful feature that I am consistently surprised is not used more. It is easy to turn on, provides an additional layer of protection for your VMs and just works. So why don’t more people use it? I am not going to be able to answer that question in this post, though I hope to provide enough information to get more people to try it out. First I’d like to briefly discuss what HA VM monitoring is, then I’ll walk through how to turn it on and configure it.

What is vSphere HA VM monitoring ? HA VM monitoring will restart a VM if:

That VMs VMware Tools heartbeats are not received in a set period of time (see below for details) and
The VM isn’t generating any storage or network IO (for 120 seconds by default, though this can be changed using the following advanced cluster level setting: das.iostatsInterval)

Why wouldn’t VMware Tools send heartbeats and the VM stop generating IO? More than likely because the Guest Operating System on the VM has crashed (eg. Blue Screen of Death) or become otherwise very unresponsive. At this point the best thing to do to keep the application as available as possible is to reset the VM.

What if there is something related to what caused the crash displayed on the screen? If the VM is reset that is going to be lost, right? No, to assist with troubleshooting the cause of the OS crash, just before the VM is reset, a screenshot is taken of the VM and placed with the VMs files.

When exactly will the VM be reset? There are 3 built in presets (Low, Medium & High) and the option to select custom settings for any of these options.

	Failure Interval	Minimum uptime	Maximum per-VM resets	Maximum resets time window
Low	120 secs	480 secs	3	7 days
Medium	60 secs	240 secs	3	24 hrs
High	30 secs	120 secs	3	1 hr

What do the different options mean?

Failure interval: HA will restart the VM if the VM heartbeat has not been received in this interval
Minimum uptime: HA will wait this long after a VM is started to begin monitoring for VM tools heartbeats, storage and network IO
Maximum per-VM resets: HA will restart the VM a maximum of this many times within the “Maximum resets time window”
Maximum resets time window: (see “Maximum per-VM resets” above)

How do you enable and configure HA VM monitoring?

Select the cluster where you want to enable HA VM monitoring then select Manage > Settings > Services > vSphere HA and click the Edit button