Home > Blogs > VMware vSphere Blog


vSphere HA VM Monitoring – Back to Basics

In my experience as a customer, partner and working for VMware, I’ve found HA VM monitoring to be an incredibly helpful feature that I am consistently surprised is not used more. It is easy to turn on, provides an additional layer of protection for your VMs and just works. So why don’t more people use it? I am not going to be able to answer that question in this post, though I hope to provide enough information to get more people to try it out. First I’d like to briefly discuss what HA VM monitoring is, then I’ll walk through how to turn it on and configure it.

What is vSphere HA VM monitoring ? HA VM monitoring will restart a VM if:

  • That VMs VMware Tools heartbeats are not received in a set period of time (see below for details) and
  • The VM isn’t generating any storage or network IO (for 120 seconds by default, though this can be changed using the following advanced cluster level setting: das.iostatsInterval)

Why wouldn’t VMware Tools send heartbeats and the VM stop generating IO? More than likely because the Guest Operating System on the VM has crashed (eg. Blue Screen of Death) or become otherwise very unresponsive.  At this point the best thing to do to keep the application as available as possible is to reset the VM.

What if there is something related to what caused the crash displayed on the screen? If the VM is reset that is going to be lost, right? No, to assist with troubleshooting the cause of the OS crash, just before the VM is reset, a screenshot is taken of the VM and placed with the VMs files.

When exactly will the VM be reset? There are 3 built in presets (Low, Medium & High) and the option to select custom settings for any of these options.

Failure Interval Minimum uptime Maximum per-VM resets Maximum resets time window
Low 120 secs 480 secs 3 7 days
Medium 60 secs 240 secs 3 24 hrs
High 30 secs 120 secs 3 1 hr

What do the different options mean?

  • Failure interval: HA will restart the VM if the VM heartbeat has not been received in this interval
  • Minimum uptime: HA will wait this long after a VM is started to begin monitoring for VM tools heartbeats, storage and network IO
  • Maximum per-VM resets: HA will restart the VM a maximum of this many times within the “Maximum resets time window”
  • Maximum resets time window: (see “Maximum per-VM resets” above)

 How do you enable and configure HA VM monitoring?

  • Select the cluster where you want to enable HA VM monitoring then select Manage > Settings > Services > vSphere HA and click the Edit button

Edit HA settings

  • Under VM Monitoring > VM Monitoring Status select VM Monitoring Only

HA VM Monitoring Settings 2

  • For Monitoring Sensitivity select a preset or choose custom settings

If you want to exempt VMs from VM Monitoring utilize the Cluster > VM Overrides setting

Cluster VM Overrides

I look forward to hearing about your experiences with HA VM Monitoring and HA in general.

For future updates follow me on Twitter: @gurusimran

7 thoughts on “vSphere HA VM Monitoring – Back to Basics

  1. larstr

    I think the reason that it’s not used more is that it has not been a flawless function. I had a customer that under high load would get sudden reboots. It turned out that HA VM Monitoring was causing it. We set sensitivity to low, but the VM still kept rebooting randomly so we eventually turned it off and it hasn’t rebooted for this reason ever since.

    Things may have improved in 5.5 so it could be worth trying enabling it again.

    Lars

    Reply
  2. Joel

    I think people are skeptical of certain types of automation, particularly automatic rebooting. I have faith in HA and DRS but for some reason it’s sacrilegious to think of rebooting a guest without intervention.

    Reply
  3. Jeff HunterJeff Hunter

    In response to larstr and Joel above, I agree there are exceptions to nearly every “policy”. There may be a VM that is under high load where it makes sense to override the default policy (either different settings or turned completely off). However, in the case of the high load VM, if it is not able to send VMware Tools heartbeats for a solid two minutes (120 second failure interval), then I suspect you have bigger problems: VM is not sized correctly, network is overloaded, etc. As for automatic rebooting, again I think there are a few scenarios where you would not want this, but these are probably exceptions, not the rule. In most environments, uptime is king – using vSphere HA VM and App monitoring contributes to that effort.

    Reply
  4. Brian Graf

    Excellent write-up GS! From the customers I have talked to, the biggest reason for not doing it is the fear that it may “break” something. If their VM’s are currently running and not having issues they think “if it ain’t broke, don’t fix it”. However, if and when their VM’s do crash, they wish they’d enabled this. I think more people would be enabling this if they were to see the number of people who DO use this and realize, “Hey, this really is a good thing to enable”.

    Reply
  5. Brian

    How about an option to alert with these conditions are met, before fully enabling the automated reboot option? That would give us an opportunity to see how it would react in our environments before hand?

    Reply
  6. Herschelle

    What happens to VMs that do have VM Tools installed if you set the VM HA at the cluster level? E.g. Some virtual appliances. Would they just keep rebooting? Or because it also check the IO they would be ok?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>