vSAN Capacity Management and Monitoring Series
VMware vSAN provides a few dashboards for tracking capacity utilization, free space remaining, deduplication and compression efficiency, and more. There are also the classic vCenter Server alarms that provide various alerts for vSAN just like other datastores types (VVols, VMFS, NFS). Last, but not least, vSAN Health includes capacity-related health checks. This blog article series takes a closer look at these capacity management features and provides details on how to get the most value from them.
The first thing I will point out is the three items introduced in the opening paragraph above are all available in the vSphere Client. vCenter Server alarms have been in place for years. Navigating the vSAN capacity dashboards and vSAN Health is just as intuitive. There is no need to install, configure, and learn another tool for managing storage.
vSAN Capacity Overview
Let’s start the series with coverage of the vSAN Capacity Overview section. This part of the UI provides a capacity overview at a glance. It is easy to see the total raw capacity of the vSAN datastore, how much of this capacity is consumed, and how much raw free space remains.
Note that I used the term “raw” when describing these metrics. That is because these numbers do not take into account how much usable capacity this translates to. HCI storage solutions including vSAN use various forms of mirroring and erasure coding to help ensure the availability of data in the event of a drive or entire host failure.
The level of resilience configured has a direct impact on how much raw capacity is used. For example, a 100GB virtual disk that is mirrored (RAID-1) to withstand the loss of one drive or host requires up to 200GB of raw capacity—100GB for each mirror. If that 100GB virtual disk must withstand two simultaneous failures, three mirrored copies are needed, which requires up to 300GB of raw capacity.
The latest version of vSAN makes it easy to see how much free usable capacity is available based on the assignment of a given storage policy. In the screen shot below, we see the total raw capacity of the vSAN datastore is 5.82TB. There is 4.02TB of Free capacity (raw).
The vSAN Capacity Overview section enables us to select a storage policy and see how much usable free space remains. In our example, there is 2.01TB of “Free Usable with Policy” capacity, which is half of the 4.02TB of free raw capacity. That is exactly what is expected considering two copies of each object are created when the vSAN Default Storage Policy is assigned. The vSAN Default Storage Policy specifies the use of RAID-1 mirroring and a Number of Failures to Tolerate (FTT) of one. Objects that are assigned the vSAN Default Storage Policy will have two mirrored copies.
The math becomes a bit more complicated when RAID-5/6 erasure coding is used for availability. Again, the latest version of vSAN makes it easy. Simply select a storage policy that contains a RAID-5 or RAID-6 fault tolerance method rule and an estimate of usable free space is provided—3.03TB in this case.
vSAN Capacity Overview Metrics
As for the rest of the Capacity Overview metrics, these are self-explanatory—they will explain themselves by hovering the mouse cursor over the metric as shown below.
In the screen shot we see the definition of the Used – VM overreserved metric. Let’s spend a bit of time on this metric in particular…
vSAN thin-provisions objects by default. As an example, a 100GB virtual disk that contains 15GB of data will consume 15GB of free usable capacity. Naturally, more data can be placed on this virtual disk since we are only using 15GB of the 100GB configured. If the 100GB virtual disk is completely full, it consumes 100GB of usable capacity.
Some of the objects on our vSAN datastore are assigned a storage policy with an Object Space Reservation (OSR) rule set to Thick Provisioning. OSR is commonly used for an important workload that dynamical consumes storage capacity. vSAN reserves the amount of configured capacity for objects with OSR. Example: vSAN will reserve 100GB of usable capacity for a 100GB virtual disk even if it contains only 15GB of actual data. The Used – VM overreserved metric shows the difference between the amount of reserved capacity and used capacity. This reserved capacity can be freed by assigning a storage policy that does not include an object space reservation.
Maintain Adequate Slack Space
As with any storage platform, bad things happen when you run out of capacity. VMware recommends keeping 25-30% raw free space on the vSAN datastore for a few reasons such as snapshots and the one discussed in this blog article: vSAN Operations: Maintain Slack Space for Storage Policy Changes.
With that in mind, vSAN administrators should include this “slack space” or free space in their calculations of usable capacity. Using our example 5.82TB datastore above, the recommended usable free space is 2.18TB assuming 25% of the raw capacity is set aside for slack space and the vSAN Default Storage Policy is assigned to our virtual machines…
5.82TB raw x .75 (25% slack space) = 4.36TB raw
4.36TB raw / 2 (two mirrored copies of each object) = 2.18TB usable
Fortunately, vSAN Health will raise an alert if free space drops below 20% on any of the physical vSAN capacity drives. When this happens, vSAN initiates a reactive rebalance operation to more evenly distribute vSAN components across capacity drives and free up space on the highly utilized capacity drive(s). VMware Knowledge Base article 2108907 has more information. vCenter Server alarms should also be used to monitor capacity and raise an alert when the vSAN datastore free space gets low. We will discuss these items in the next article.
@jhuntervmware on Twitter