By: Chuck Petrie, Blue Medora
Storage can be difficult to troubleshoot. Maybe the hardest question is, should we even be blaming storage right now? In vRealize Operations, management packs can be tightly integrated with the virtual layer. The Management Pack for NetApp Storage provides a holistic view of the virtual layer and storage so you can understand performance bottlenecks. You can identify if the bottleneck is in storage, but at the same time where the traffic is coming from in the virtual layer. This blog will walk you through some of the options for troubleshooting NetApp with vRealize Operations.
Figure 1 – View of the NetApp Relationships
Before I start, I want to note that this Management Pack ties directly into OnCommand Unified Manager via API Services to provide performance metrics, relationships, and events. In Figure 1 we can see the relationships in a NetApp system. Data is shown from the cluster all the way down to the disk. Out of the box, the Management Pack offers 14 dashboards, 25 views, and over 700 metrics.
Figure 2 – NetApp Storage Topology dashboard
Custom dashboards offer at-a-glance views into your NetApp environment as we can see in Figure 2. In this example, I have selected a NetApp volume. The volume to VM relationship is brought into focus throughout the stack. Details on the volume are provided in the Health Tree, Metric Sparklines, and Metric Graph widgets. The volume is in a healthy state, but we should investigate the Aggregate, System, and Cluster based on the red statuses (critical) shown in the relationship map.
Figure 3 – Summary Tab of NetApp Cluster
Let’s move on to the Environment tab where we’ll select a NetApp cluster under the Inventory Tree. A summary page of the NetApp cluster will appear (see Figure 3). On this screen we can see both the health alerts for the NetApp Cluster and all of the descendants. This allows us to pick the alerts that should be investigated immediately and which can wait based on criticality. In this example, we see that one of the alerts listed under Top Risk Alerts For Descendants tells us that two objects are being impacted by the following alert: “Virtual Machine has chronic high memory workload leading to stress on a virtual machine.” If this alert is selected, a list of impacted objects appears and we can navigate to the alert menu to begin troubleshooting.
Figure 4 – NetApp Volume Based Metrics
In the Troubleshooting tab, we find the 700 out-of-the-box metrics mentioned above. In Figure 4 we see three NetApp volume metrics: average latency, operations per second, and throughput. These are the key performance metrics for a storage array. vRealize Operations’ predictive analytics differentiates normal behavior from anomalous behavior for us. Here we see an anomaly in the Average Latency (ms) on the metric graph around 9PM highlighted in yellow.
Now you have some more tools for finding bottlenecks in NetApp with vRealize Operations!