Cloud Management Platform Management Packs vRealize vRealize Operations

Storage Troubleshooting in a 3-2-1 Hardware Stack

By: Chuck Petrie, Blue Medora

As the storage market transitions to all-flash, hybrid array storage providers strategically work to accelerate performance and utilization. In doing so, it becomes more difficult to categorize hot data, dynamic and agile storage capacities, and other new age concepts. Today we will be following up on a previous blog where we investigated network troubleshooting in a 3-2-1 hardware stack. As a reminder, the term “3-2-1” refers to a redundant architecture of 3 servers, 2 switches, and 1 storage array, or some derivative of that nature (2-2-1, etc). We will then dive into vRealize Operations to determine the root cause of an issue within a 3-2-1 stack at the storage layer and expand on using robust management console as a data aggregator.

To begin the troubleshooting, we will look at the relationships of the 3-2-1 stack using the Management Pack for Nimble Storage to determine which storage array(s) are associated with the environment. After understanding the relationships we will investigate performance and capacity metrics that point out issues associated with poor planning and overutilization of resources.

Blog Storage Troubleshooting 3-2-1 Nimble Full Stack (1)

Figure 1 – Custom 3-2-1 (Derivative) Stack Dashboard

To understand how quickly a 3-2-1 architecture can become complex, we’ve built out a custom dashboard as shown in Figure 1. As we can see, a simplistic concept becomes complex as we add in objects such as storage pools, volumes, users, etc. Each of these objects has the ability to express a symptom or could be the root cause of an issue. Looking at this relationship map we can see dependencies through the stack and understand the health of each object based on the color associated with it. It’s standard in vRealize Operations to have any yellow object associated with a warning, orange object with an immediate need, and red object with a critical alert.  

Blog Storage Troubleshooting 3-2-1 Alerts (2)

Figure 2 – Nimble Health Investigation Dashboard

To quickly understand the alerts associated with storage and virtual layer of the 3-2-1 stack we can utilize out-of-the-box dashboards. As shown in Figure 2, we can see each object associated with the storage layer and quickly see all alerts associated with it and down to the virtual layer.

Blog Storage Troubleshooting 3-2-1 Nimble-Datastore-VM (3)

Figure 3 – Nimble Hosted VMs Dashboard

Using another dashboard we can understand the capacity and performance of the virtual layer and how the underlying storage is affecting it. At-a-glance, the Nimble Hosted VMs Dashboard (Figure 3) associates the VM with the Datastore down to the underlying storage volume. As the relationships are displayed, we can also see capacity and performance of each and quickly determine how one object impacts another.

Blog Storage Troubleshooting 3-2-1 Nimble Capacity Metrics (4)

Figure 4 – Nimble Storage Troubleshooting Capacity Metrics

Drilling down further into the storage layer we can see capacity trends. Shown in Figure 4, we are able to look deeper into capacity utilization to understand and investigate deduplication ratios, compression ratios, and replication capacities.

Blog Storage Troubleshooting 3-2-1 Nimble Disk IO (5)

Figure 5 – Nimble Storage Troubleshooting Performance Metrics

One of the most important issues we face within an environment is assessing performance and determining performance-based issues. Using the troubleshooting tab within vRealize Operations, we can investigate performance metrics to determine latencies and associate them with performance issues. As shown in Figure 5, we can understand cache hit percentages, read/write IO, and read/write throughput. Going a step further, we have the ability to determine our block size and understand how sequential and nonsequential data impacts our storage systems.

Blog Storage Troubleshooting 3-2-1 Capacity Badge (6)

Figure 6 – Nimble Capacity Remaining

Another great feature is that we can utilize inside vRealize Operations is to understand change rates within our environment. The management console has the ability to understand what our data trends are based on where we were, where we are, and then systematically calculate where we will be going.

Troubleshooting a 3-2-1 stack can be simplified by extending the capabilities of vRealize Operations.  Using third party management packs to build relationships through the stack and provide key performance and capacity indicators allow us to have a centralized management console. Walking through vRealize Operations today we were able to show that the value of coupling an analytical engine with third party management packs to streamline troubleshooting within a 3-2-1 infrastructure stack.

For more information or a free trial of the Management Pack for Nimble Storage by Blue Medora, visit the product page on Blue Medora’s website.