This blog was authored by Tim George.
The January release of VMware Aria Operations gave us an exciting new feature. We now have HA for application monitoring! If you are familiar with application monitoring using Telegraf in VMware Aria Operations, you already know that it is highly dependent on Cloud Proxies. Any data that is being collected from endpoints is pushed to VMware Aria Operations through these Cloud Proxies and application monitoring ARC adapters are the only adapters that can push data from endpoints (all other management packs use the pull method). Previously, these ARC adapters didn’t support collector groups and the Cloud Proxy was a single point of failure for application monitoring. If the Cloud Proxy fails, data from the endpoints wouldn’t reach Aria Operations. We sought to reconcile this limitation. To address the challenge, we added support for application monitoring through Collector Groups so that if one Cloud Proxy should fail, metrics can still flow from another Cloud Proxy in the Collector Group making this feature highly available.
The first item we wanted to take care of was the creation of these Collector Groups. We simplified the experience and made it much easier to add new groups and enable/disable high availability from within this UI.
Once we have added a new Collector Group, we can now filter by these groups when we look at all our Cloud Proxies. We can group our proxies by Collector Groups and see each of the Cloud Proxies that make up the group or look at only our ungrouped Cloud Proxies.
There also is a mechanism to retry configuration if there have been any changes in members of a Collector Group. That is, whenever a Cloud Proxy is added or removed, we have options to “Retry Cloud Proxy Configuration” from this screen, as well as activating/deactivating data persistence.
To talk about putting this into practice, we also need to talk about a few important characteristics of this new feature. The first is that bootstrap/re-bootstrap of the Telegraf agent is required in order to use HA. Older versions of the agent will not be able to handle the changes that have been made. Of course, this can be done from within Aria Operations by going to “Environment à Applications à Manage Telegraf Agents”. When installing/re-installing these agents, we will now get a different pop-up than what we used to see. This pop-up allows us to install the agent and assign to a Cloud Proxy or Collector group based on if we wanted High Availability or not. If utilizing high availability, we can select this radio button and select the Collector Group we wish to assign it to.