Top 3 IT Operations Badges Every Administrator Should Know
Companies Want Efficient Operations
Understand VMware vRealize Operations Manager badges and Blue Medora True Visibility Suite (TVS) objects for efficient operation of your virtual infrastructure.
There are three major badges used in vRealize Operations (vROps): Health, Risk, and Efficiency. Each discovered object, from VMs to Cisco UCS blades, will have these badges.
VMware vRealize® Operations™ Manager Badge Basics
- The major badges are comprised of minor badges. Not all objects will have each minor badge, as not all minor badges are applicable to each object. This is the case for vCenter®, VM, Host, Datastore, and Blue Medora True Visibility Suite discovered objects.
- Badge health is reflected on a scale (score) from 0 (bad/red) to 100 (good/green).
- The badge score thresholds can be modified by the vROps administrator.
- Unavailable minor badges will be greyed out as indicated in the screenshot below.
Let’s explore these 3 major vRealize badges.
1. Health Badge – Major Badge – Deals With Immediate Issues
This badge reflects the current health of a particular object, a VM, Host, Datastore, or any Blue Medora TVS discovered object, a NetApp FAS Aggregate, a Cisco UCS Blade, or something similar. It makes sense to ask, how then is health determined? Health is comprised of values from minor badges: Faults, Anomalies, and Workload. How are these minor badges defined?
- Faults Badge – The Faults badge measures the degree of problems that the object might experience based on events retrieved from the vCenter Server® or by the Blue Medora TVS adapter (Management Pack and adapter are often used interchangeably). The Faults score is calculated based on events published by the vCenter Server or the TVS target (The endpoint technology like AWS that the adapter connects to import data back to the TVS adapter).
- This score includes events like loss of redundancy in NICs or HBAs, memory checksum errors, HA failover problems, CIM events, and so on. Faults are included in the health score because they require immediate resolution. The scores are computed based on the severity of the underlying problems.
- Resolution of the problem indicated by the Fault will restore the resource’s health score. Unlike other badges in vROps, the Faults badge does not have an alert generated from its threshold score. Instead, each problem generates its own fault alert, and resolution of the problem both clears or cancels the alert and lowers the badge score. The Faults badge for a Datastore is shown below:
- Anomalies Badge – vROps calculates dynamic thresholds for each metric that is collected for an object.
- vROps also analyzes the number of metrics that are violating their dynamic thresholds to determine trends and normal levels of threshold violations. Based on these trends, the Anomalies badge score is calculated using the total number of threshold violations for all metrics for the selected object and its child objects.
- The vROps Anomalies badge score represents how abnormal the behavior of the object is, based on its historical metrics data. Because changes in behavior often indicate developing problems, if the metrics of an object go outside the calculated thresholds, the anomalies score for the object grows. As more metrics breach the thresholds, anomalies continue to increase.
- Violations by KPI metrics increase the Anomalies score more than violations by non-KPI metrics. A high number of anomalies usually indicates a problem or at least a situation that requires your attention. The Anomalies badge for a Datastore is shown below:
- Workload Badge – Workload in vROps is the demand for resources that an object wants versus the actual capacity the object is able to access. The Workload badge value is a score based on how hard an object must work for resources. Use the Workload value as an investigative tool when you are researching capacity constraints or evaluating the general state of objects in your environment.
- vROps indicates the workload by a colored icon that is based on the defined badge score thresholds. The Workload badge for a NetApp FAS LUN is shown here:
2. Risk Badge – Deals With Future Issues
The Risk badge indicates potential problems that might eventually degrade the performance of the system. Risk does not necessarily imply a current problem. Risk indicates problems that might require your attention in the near future, but not immediately. vROps calculates the Risk score using minor badges Stress, Time Remaining, and Capacity Remaining. The formula that is applied to calculate the Risk score is the inverse geometric weighted mean. Risk is comprised of minor badges Stress, Time Remaining, and Capacity Remaining. How are these defined?
- Stress Badge – Stress analysis is how vROps calculates the amount of demand an object generates over a period of time.
- This analysis looks at the object’s workload against its capacity. This helps in sizing the object to meet the resource demands. The Stress score indicates the historic workload of the selected object. While the Workload score shows a snapshot of the current resource usage, the Stress score analyses the resource usage data for a longer period. The Stress score is calculated as a ratio between the demand for resources and the usable capacity for a certain period. The Stress score helps you identify Hosts and VMs that don’t have enough resources allocated or hosts that are running too many virtual machines.
- Stress applies to Blue Medora TVS discovered objects as well. A high Stress score does not imply a current performance problem, but highlights potential for future performance problems.
- Below is a screenshot showing the Stress of a Blue Medora TVS discovered NetApp LUN:
- Time Remaining Badge – This badge score indicates how much time is remaining before the resources of the object are exhausted.
- It is calculated per resource type for an object. For example, CPU usage or disk I/O is based on the historical data for the object type. Based on this historical data, the time remaining score represents the estimated time remaining. The Time Remaining score allows you to plan the provisioning of physical or virtual resources for the selected object or change the workload to adjust the needs of the resources in your virtual environment.
- vROps calculates the Time Remaining score as a percentage of time that is remaining for each compute resource compared to the provisioning buffer you set in the Configuration dialog box. By default, the Time Remaining score provisioning buffer is 30 days. If even one of the compute resources has less capacity than the provisioned buffer, the Time Remaining score is 0. The screenshot below shows Time remaining for an HPE 3PAR array:
- Capacity Remaining Badge – This badge represents the capacity remaining in your environment.
- It represents the capability of your virtual environment to accommodate new VMs. The remaining VM count represents the number of VMs that can be deployed on the selected object, based on the current amount of unused resources and the average VM profile for the last “n” weeks. vROps calculates the Capacity Remaining score as a percentage of the remaining capacity count compared to the total amount of capacity that can be deployed on the selected object. Capacity Remaining is shown below for a NetApp FAS volume:
3. Efficiency Badge – Deals With Optimization Opportunities
Efficiency does not tell you about current or future performance problems but tells you how to run a more efficient datacenter. Efficiency is comprised of Reclaimable Capacity and Density minor badges. Let’s explore them here.
- Reclaimable Capacity Badge – Reclaimable Capacity is the amount of provisioned capacity that can be re-claimed without causing stress or performance degradation. Reclaimable Capacity is calculated for each resource type like CPU, memory, and disk, for each object in the environment, including those discovered by Blue Medora. It identifies the amount of resources that can be reclaimed and provisioned to other objects in your environment. Below is the Reclaimable Capacity for a VM:
- Density Badge – The Density score indicates the consolidation ratios, such as virtual machines per host, virtual CPUs per physical CPU, virtual memory per physical memory, and so on. You can use the Density score to achieve higher consolidation ratios and cost savings. Density for a ESXi Host is shown below:
VMware vRealize Operations Manager: