Multi-Cloud

How VMware IT Modernized Their Monitoring Tools

by: VMware Senior Monitoring Tools and Automation Engineer George Stephen Manuel; VMware Senior Manager, Observability Analytics & Tools Ravishankar Rao; and VMware Programmer/Analyst, Professional Satyanarayanaraju Bhupathiraju

As VMware continued to make acquisitions and drive organic growth, the infrastructure monitoring landscape expanded to include many thousands of devices, significantly increasing device licensing costs.

Simultaneously, challenges with configuration and tool management started to build up. The increased complexity of our infrastructure landscape and growing demand for modern, proactive features such as dashboards, reports, rich historical data, and real-time monitoring made it clear that the existing tooling was no longer adequate.

Defining Solution Requirements and Tool Selection

Before we got to work migrating to new infrastructure monitoring solutions, we needed to formulate a solid definition of success – what were we trying to achieve and how would we know when we got there. To get started, VMware IT prepared a case study focused on the current data collection methodology, frequency, metrics, and how the existing tool issued alerts and notifications. It included the service owner requirements according to application type for new custom groups, dashboards, views, and scheduled reports. We collected all the custom monitoring requirements and created user stories to support the service owner’s requests.

VMware IT recognized the need to move from traditional monitoring to a more modern approach. Shifting from reactive monitoring to a proactive, modern stance would help us identify new issues and root causes more quickly. The advanced reporting capabilities in modern monitoring solutions provide the flexibility to customize performance metrics based on service owner requirements.

VMware IT selected VMware vRealize® Operations Managerfor our infrastructure monitoring, to gain complete application-to-storage visibility across physical, virtual, and cloud infrastructures. We can now investigate and solve complex technical issues faster because of the more precise analytics provided by vRealize Operations. Once the vCenters are identified and plugged in, all components under the purview of each vCenter get automatically monitored during the lifecycle of the component.

Preparing for Migration Included Customization

Ensuring continuous monitoring with no impact to service owners with a seamless migration was a critical objective that drove the team’s preparations. The first step was to set up the vRealize Operations node and location-based collectors behind the load balancer, to meet network latency and high availability standards.

Customization was a key area of focus. We ensured that the correct ownership was assigned to each application and tagged accordingly to configure alert notifications based on the application owner’s specifications and policies.

In vRealize Operations we used the “delay” feature to set alert timeframes according to the requirements of service owners. This helped us to ensure that all the alerts received by service owners are actionable. We also created container tags to distinguish our inventory as production, non-production and maintenance which helps our team provide efficient operational support.

We wrote several scripts based on the vRealize Operations Suite API to automate steps to configure the monitoring tool.

  • Simplified the migration of 4,000 plus devices
  • Separated devices by operating system (OS)
  • Grouped and mapped them according to specific policies
  • Attached objects and devices to service owners
  • Created remote checks for critical items mapped to container tags

Once we completed writing these scripts, it was time to launch the migration of devices from the current tool to vRealize Operations.

How We Approached Migration and Implementation

To launch this phase, we completed several steps and collaborated closely with service owners to ensure their acceptance of the vRealize Operations alerts.

The first step was to extract the master object inventory list from the existing tool. Next, the VMware IT infrastructure team installed agents on servers and confirmed the object availability from vRealize Operations. Finally, we initiated the prepared scripts to create the objects and then mapped to the groups and policies from the existing tool object inventory list.

The plan was to run parallel monitoring between the current tool and vRealize Operations for three months. We created a separate channel in the Network Operations Centers (NOC) dashboard to receive all vRealize Operations alerts and run a comparison between the existing tool and vRealize Operations alerts against the objects.

During the parallel monitoring stage, VMware IT exported the data from NOC dashboard, then worked together with the service owners to compare and fine-tune the alerts. Once the service owners accepted the alerts issued from vRealize Operations, we discontinued parallel monitoring and began utilizing vRealize Operations as the primary monitoring tool for VMware IT infrastructure.

Benefits of Migrating to Modern Monitoring Tool

In addition to eliminating the cost for a third-party monitoring tool, using this modern solution provides numerous advantages:

  • Monitoring tool isolates issues quickly
  • Application owners self-check their application health
  • Application owners can create their own application specific dashboards
  • Operations team works on the system proactively
  • Infrastructure team uses monitoring data to plan for capacity use
  • Application management packs provide a deeper level of application-specific metrics.
  • Ease of plugging in objects for auto-discovery by using various adaptors and API suites to automate operational work

Results of Deploying a Modern Monitoring Tool

Overall, we’re providing an improved monitoring solution to our service owners. It is customized to meet their specific application requirements, including when to generate system alerts and notifications.

When we measured system performance three months after the vRealize Operations implementation, it was evident that the alerts to actionable alert ratio for infrastructure monitoring had improved by 20 percent whereas proactive monitoring had helped us reduce incidents by 10 percent.

Our new monitoring posture enables us to accommodate future growth requirements and prepares us to embrace migration from on-prem to cloud monitoring solutions such as VMware vRealize®Operations Cloud.

VMware on VMware blogs are written by IT subject matter experts sharing stories about our digital transformation using VMware products and services in a global production environment. Contact your sales rep or [email protected] to schedule a briefing on this topic. Visit the VMware on VMware microsite and follow us on Twitter