In my experience as a customer, partner and working for VMware, I've found HA VM monitoring to be an incredibly helpful feature that I am consistently surprised is not used more. It is easy to turn on, provides an additional layer of protection for your VMs and just works. So why don't more people use it? I am not going to be able to answer that question in this post, though I hope to provide enough information to get more people to try it out. First I'd like to briefly discuss what HA VM monitoring is, then I'll walk through how to turn it on and configure it.
What is vSphere HA VM monitoring ? HA VM monitoring will restart a VM if:
- That VMs VMware Tools heartbeats are not received in a set period of time (see below for details) and
- The VM isn't generating any storage or network IO (for 120 seconds by default, though this can be changed using the following advanced cluster level setting: das.iostatsInterval)
Why wouldn't VMware Tools send heartbeats and the VM stop generating IO? More than likely because the Guest Operating System on the VM has crashed (eg. Blue Screen of Death) or become otherwise very unresponsive. At this point the best thing to do to keep the application as available as possible is to reset the VM.
What if there is something related to what caused the crash displayed on the screen? If the VM is reset that is going to be lost, right? No, to assist with troubleshooting the cause of the OS crash, just before the VM is reset, a screenshot is taken of the VM and placed with the VMs files.
When exactly will the VM be reset? There are 3 built in presets (Low, Medium & High) and the option to select custom settings for any of these options.
|Failure Interval||Minimum uptime||Maximum per-VM resets||Maximum resets time window|
|Low||120 secs||480 secs||3||7 days|
|Medium||60 secs||240 secs||3||24 hrs|
|High||30 secs||120 secs||3||1 hr|
What do the different options mean?
- Failure interval: HA will restart the VM if the VM heartbeat has not been received in this interval
- Minimum uptime: HA will wait this long after a VM is started to begin monitoring for VM tools heartbeats, storage and network IO
- Maximum per-VM resets: HA will restart the VM a maximum of this many times within the "Maximum resets time window"
- Maximum resets time window: (see "Maximum per-VM resets" above)
How do you enable and configure HA VM monitoring?
- Select the cluster where you want to enable HA VM monitoring then select Manage > Settings > Services > vSphere HA and click the Edit button
- Under VM Monitoring > VM Monitoring Status select VM Monitoring Only
- For Monitoring Sensitivity select a preset or choose custom settings
If you want to exempt VMs from VM Monitoring utilize the Cluster > VM Overrides setting
I look forward to hearing about your experiences with HA VM Monitoring and HA in general.
For future updates follow me on Twitter: @gurusimran