With each new release of vSAN, it is easy to focus solely on the flagship features and functionality introduced. Enhancements like data-at-rest encryption, or adaptive resynchronization tend to get the most fanfare because they focus on addressing a very specific challenge. Beneath the headlining features lay enhancements that may benefit users more than first impressions might give. Integration enhancements typically fall in this category.
The vSAN engineering teams work closely with other product development teams to ensure that our APIs offer the capabilities to allow for ongoing improvements in these other products. vRealize Operations is a fine example of this, and with the recent introduction of vR Ops 7.0, offers up impressive improvements for vSAN customers. One of those features is awareness of vSAN stretched clusters.
Improving visibility in stretched cluster environments
The “Hosts and Cluster” view in vCenter has always provided a nice visual representation of hosts that make up a cluster and clusters that live in a data center. Hosts are listed in alphanumeric order under the cluster name. vCenter has limitations in its ability to present hosts with any logical separation of hosts in their respective sites when running a vSAN stretched cluster. This limitation extended to other solutions such as vRealize Operations. vRealize Operations 7.0 solves this issue with full awareness of vSAN stretched clusters. Dashboards in vRealize Operations can be constructed using this logical separation of “sites.” Figure 1 shows how vCenter views a vSAN stretched cluster in the “hosts and Clusters” view, and how vR Ops can view the same vSAN stretched cluster, distinguishing sources that form Site A, and Site B.
Figure 1. Comparing a vSAN stretched cluster in the Hosts and Clusters view in vCenter to vRealize Operations
Ways to use this new level of visibility
Cluster awareness introduces new and interesting ways that data center administrators can gain at-a-glance status conditions, as well as deeper level insight and analytics. Figure 2 shows a very simple dashboard that gives a quick view of the status of specific hosts that comprise each site within a vSAN stretched cluster while providing full drill-down capabilities into each object presented in the widget.
Figure 2. Host status and visibility of that make up a vSAN stretched cluster
Most importantly, a user will be able to better understand the resource demands on each site. Figure 3 shows a custom dashboard built that represents the current and historical resource usage in each respective site within a single vSAN stretched cluster. The graphs showing historical utilization percentage for CPU, RAM, and disk space can be easily toggled off and on by clicking on the metric.
Figure 3. Resource utilization (in %) of hosts at each site for a vSAN stretched cluster
Why is viewing host utilization aggregated to a site level helpful? To understand this better, let’s review common recommendations in vSAN cluster design
- When configuring a non-stretched vSAN cluster, VMware recommends deploying hosts with similar or identical configurations across all cluster members. While not a strict requirement, this simple recommendation helps minimize challenges in accommodating for host failure scenarios, as each host would provide the same capacity and type of storage resources. Having symmetry across hosts also allows for a greater level of performance predictability.
- When configuring vSAN stretched clusters, VMware encourages site symmetry, where the host count and configurations are symmetrical across sites. This recommendation exists for similar reasons to non-stretched vSAN clusters, and presumes that most workloads will be using a storage policy that mirrors objects across sites.
With the power of storage policies (combined with DRS affinity rules), an administrator may choose to deliberately exclude VMs from being protected across both sites. This is through using the site affinity policy, as shown in Figure 4. Excluding VMs from consuming storage or compute resources on one of the sites would result in fewer resources used at one site as well as the inter-site link connecting both sites.
Figure 4. Configuring a site affinity storage policy in a vSAN stretched cluster
Using site affinity rules can present a potential opportunity to reduce the host count on one site versus the other. That challenge in the past was easily visualizing resource demands on each site in a way to determine if host counts in a site could be optimized. vR Ops 7.0 helps address this challenge.
Applying this benefit to a stretched cluster scenario
Imagine a stretched cluster powering a variety of workloads. Some VMs are business critical applications and are using storage policies that protect the data across sites. Other VMs living in the preferred site (site “A”) are less important: Used perhaps for lab and test work, and do not need the protection across sites. As shown in Figure 5, Using storage policies paired with DRS affinity rules is a way to easily save on inter-site link bandwidth, and site capacity usage.
Figure 5. Representation of resource usage for symmetrical vSAN stretched cluster with site affinity rules set with storage policies and DR
The next step would be to use vR Ops to help determine if the host count could be reduced.
Whereas the initial design intended to have equal number of nodes in each site, the operational information provided in Figure 3 may indicate an opportunity to reduce the host count on one of the sites, due to the use of site affinity policies. The user could determine that based on the resource usage, host count at the secondary site could be reduced by one or more hosts. In this example, the number of hosts that the site could be reduced to would be a result of 1.) looking at the disparity of resource usage each site, as provided in the vR Ops dashboard, 2.) estimating the new host count in the secondary site (site “B”) and 3.) verifying the following:
- Can the secondary site provide sufficient compute and memory capacities in accordance to the applied host failure (e.g. N+1 or N+2) designs?
- Can the secondary site provide sufficient free storage capacity in accordance to the applied host failure (e.g. N+1 or N+2) designs?
- Can the secondary site provide sufficient free storage capacity for operational slack space?
- Can the secondary site still support the minimum host count required by any of the used storage policies in the cluster? (e.g. 6 hosts for RAID-5, with an additional host for N+1, equaling 7 hosts)
This information would provide the insight necessary to run a stretched cluster with asymmetry, so long as the answer to the considerations above is “yes.” The degree of asymmetry would be subject to how much storage policy affinity rules were used, and other storage policy requirements. This optimized topology translates into a more cost-effective stretched cluster environment, as shown in Figure 6.
Figure 6. Representation of resource usage for optimized vSAN stretched cluster with storage policy and DRS affinity rules
Conclusion
Adding visibility of vSAN stretched clusters in vRealize Operation drives new levels of opportunity for design, operation, and optimization of vSAN environment. vSAN administrators now have an ability to easily monitor site level resources in their stretched cluster environments and customized their stretched cluster topology to best suit their needs.
A special thanks to Chris McClanahan in the VMware Cloud Management Business Unit for his collaboration in this effort.