Pager duty. Sigh.
For those sysadmins who manage 24/7/365, mission-critical systems, or global operations, a “follow-the-sun” model is part of the job. But when you are NOT on duty, you would prefer not to receive any notifications. So, an important function of alerts is the ability to alert different people depending on the schedule of who is on duty. Schedule-based alerts are useful even in during a single shift as you can plan coverage for lunches, regular meetings or temporarily disable alerts to allow focused time to work on special projects.
With vFabric Hyperic, you can set-up “follow-the-sun” notifications in a few steps.
The High-Level Steps:
Essentially, to enable follow the sun alerting we simply need to set up roles and schedules for our sysadmins. However, those schedules are only useful when put in context of setting up an alert. So we will take you through the full process of setting up a globally enabled alert. Those steps include:
1. Decide on your Alert
2. Set Up the Condition and Action for the Alert
3. Set Up Roles with Alert Calendars (this specifies who gets noticed around the clock)
4. Set Up the Escalation Scheme
Note, while we are showing the alert as a first step to provide context, roles and escalation schemes must exist before they can be selected as an action on the Alerts tab in step 2.
Decide on your Alert
First, we need to decide what alert we want on what resource. A resource is defined by the HQ Inventory Model:
- Platforms (e.g. a Linux OS or Network Device)
- Servers (e.g. a software product running like Tomcat)
- Services (e.g. a particular platform or server resource like CPU or a component of the server)
You can alert people based on changes to a target resource’s properties, actions, logs, or configuration:
- Properties are characteristics of the resources.
- Control Actions are things like start, stop, restart, configtest and can include status like in-progress or failed.
- Logs are monitored by match strings for keywords.
- Configuration looks at config files.
So, as an example, we may want to set up an alert if a service is down for all vFabric tc Server instances of Tier 2 Applications in our US data centers.
Setting Up the Condition and Action for the Alert
Once you know what resource you want an alert for, you specify 1) the condition and 2) the action you want to happen as a result of the alert firing. By specifying a condition, we can compare values, set thresholds, or just look out for any change on a resource’s properties, actions, logs, or configuration files. Actions can include sending an alert to your phone, automatically recovering an alert previously fired, or automatically remediating standard problems to afford administrators more time to find and fix the root cause. When sending alerts in a follow the sun model, we use multiple roles with alert calendars and an escalation scheme. Again, these are set up in advance of selecting an escalation scheme as the action.
To continue the example, the completion of this step let’s us monitor if a service goes down on a tc Server instance.
Setting Up Roles with Alert Calendars
At a high-level, Hyperic is organized like many systems–roles define permissions and users are assigned roles. When creating or editing roles, an admin can set permissions, resources, and calendar details. For the calendar details, we can select days, times, and exceptions for when users assigned to the role will be notified.
For example, we might have a role called “US OPS” or get more specific and use roles like “Tier 2 Apps - Group A-USDC - GMT-8” and another called “Tier 2 Apps - Group A-USDC - GMT+530” where each role has a different day/time selection and each role has the same permissions to monitor Tier 2 App resources in our US data centers. For our example, we will just use simply "US OPS", "APAC OPS" and "EMEA OPS", all having universal access to see all resources.
Setting Up the Escalation Scheme
Escalations have a predefined sequence of notification steps until the alert is marked “fixed.” The escalation has settings to allow pause, deal with state change, or repeat. It also supports one or more actions like sending an email, SMS, sys log, SNMP trap, or suppression. For an email or SMS, you simply choose a role to notify.
In our example, we would select both of the roles defined above. Depending on the time set in the alert calendars, only one role would receive the notification. One group is notified from 12am to 8am PST and another group is notified from 8am to 4pm PST while yet another has the 4pm to 12am shift.
Of course, there are various permutations on how this functionality can be used, but the alerting, roles and schedules are flexible enough to support any large scale, global operation.
>> For more information check out the documentation, or check out everything vFabric Hyperic supports and get more info on vFabric Hyperic features or even better—try it out!