posted

0 Comments

Hyper-Converged systems like vSAN are well known for their abilities to scale out linearly. Scaling out storage performance and capacity relative to compute resources is a powerful concept. By simply adding nodes to a cluster, it allows for these resources to grow proportionally in a way that meets technical demands, while using a more predictable purchasing model. The reality is that storage capacity requirements may not always align with what was initially prescribed in a host. The needs of the application owners may justifiably change, but adding nodes and associated licensing in your environment may be technically unnecessary, and financially unfeasible.

Environments are not only driven by the changing needs of an organization, but also by the changing technologies that are readily available. While the price per Gigabyte (GB) of flash has been plummeting in recent months, an equal, if not more compelling aspect is the extraordinary storage capacities of these newer flash devices. How can you take advantage of this hyper-converged infrastructure in your own vSAN environment? Let’s take a look.

Disk Groups

vSAN’s method of managing physical devices gives you the power of choice when it comes to accommodating change. Disk groups are a concept of vSAN that allows you to create a logical container consisting of a group of devices (flash, or spinning disk) on each host that provides persistent storage, and a single device for caching. Each host that contributes storage in a vSAN cluster must have at least one disk group. All disk groups contribute to a single vSAN datastore across the cluster. Using multiple disk groups per host can improve storage performance significantly while limiting the size of a failure domain; reducing the amount of data impacted by physical device failure.

Balance, and the question of scale up versus scale out

Regardless of the architecture used, environments often have a ratio of compute, memory, network, storage capacity, and storage performance that is unique to the set of workloads running in that particular environment. A common challenge when adding a large amount of relatively fast storage to an environment is ensuring enough compute power to service the VMs that might consume that additional capacity. If compute resources are lacking, then that performance is “stranded” due to insufficient resources to service the additional capacity. This can be especially challenging with some hyper-converged infrastructure solutions since the ratio is predefined upon the initial build of the hosts, and may not reflect the needs of the workloads in a given environment.

This is where the architecture of vSAN shines. Since vSAN is built right into the hypervisor, it is extremely efficient in its use of resources to service I/O. Since it is part of the kernel, resources are not taken away, or contending for the sole purpose of servicing the storage I/O as commonly found in HCI solutions that use a controller VM on each host. This gives vSAN much more flexibility in scaling up capacity on your host while maintaining balance with the other resources. That additional capacity could be used by additional VMs, or VMs that require more virtual resources.

The combination of disk groups and the efficiency of its integration right into the hypervisor allows vSAN to give you the power of choice in how to best accommodate your technical requirements. Scale out by adding nodes when you need storage and compute. Or scale up by adding disks and disk groups when you need storage capacity and associated performance.

Options for adding storage capacity to hosts

Below are three approaches to increasing storage capacity that can be used on a host using VSAN. The term “flash devices” is used instead of SSDs, primarily because the latest flash technologies now exist in a variety of form factors and protocols. Common devices types used by servers now include 2.5″ SATA, 2.5″ SAS, 2.5″ U.2 NVMe, M.2 NVMe, and PCIe NVMe Add-in card.

Approach #1: Replace flash devices in an existing disk group with higher capacity devices.

replacedisksinsinglediskgroup

Advantages:

  • Increase capacity.
  • Increase performance by using newer, better performing devices.

Disadvantages:

  • Task of removing existing flash devices to make room for higher capacity devices may introduce risk during transition.
  • Larger capacity leads to larger failure domain.

 

Approach #2: Add flash devices as capacity to an existing disk group.

adddisksinsinglediskgroup

Advantages:

  • Incrementally increase capacity.
  • No rip and replace of existing flash devices.
  • Increase performance by using more devices at persistent tier.

Disadvantages:

  • May not have additional open drive bays.
  • Larger capacity leads to larger failure domain.
  • Unless all drives are replaced, capacity and possibly performance becomes asymmetric.

 

Approach #3: Add flash devices as capacity, and a flash device as buffering to a new disk group.

adddiskstonewdiskgroup

Advantages:

  • Incrementally increase capacity with latest flash technologies.
  • No rip and replace of existing flash devices.
  • Increase performance by using additional device as write buffer, and more threads to service I/O requests.

Disadvantages

  • May not have additional open drive bays.

Do all host need to have the same additions? No, but the best cluster designs strive for consistent resources across all hosts when it comes to CPU and Memory. With hyper-converged infrastructure environments, storage components should follow the same principals of consistency. If you plan on adding devices to a new or existing disk group, it is recommended to add those same new devices across all hosts in the cluster.

Strategic approach to purchasing

Now let’s put this knowledge to use for a better way to address your needs now, while protecting the architecture against invalidation by tomorrow’s innovations. Approach #3 (Adding a new all-flash disk group) will be used.

A cluster of four Dell R730xd hosts will be used in this example, but the same approach could be used with any system on the VMware Compatibility Guide for vSAN. The R730xd has 24 bays up front, and 2 in the rear. These bays will accept 2.5″ SATA or SAS drives, with 4 bays accepting 2.5″ U.2 based NVMe drives.

Upon initial purchase, one could choose to populate just 16 of the drive bays, and split those into two disk groups. In each disk group, this would leave 7 flash devices for a capacity tier, and one device as a write buffer.

In 12 to 18 months, it may be necessary to grow capacity, and the remaining 8 empty bays can then be populated in a similar manner as the other drive bays, yet be configured to act as a 3rd disk group. With market conditions of flash, the capacity tier of just that one disk group may be twice as dense, yet half the cost as an original disk group. The devices available for a write buffer may have performance capabilities well beyond the devices originally purchased.

vsancluster

 

*Drive bay and disk group associations simplified for clarity. May not reflect actual locations.

Imagine some of the possibilities. In this conservative scenario, one might be able to easily double the capacity of the cluster just by adding devices in the future, without any incremental costs beyond the handful of flash devices. It becomes quite easy to see the benefits that sit in contrast to traditional purchasing or limitations with other solutions.

  • Lower up-front costs while taking advantage of the economics of flash.
  • Add capabilities to your data center without additional capital purchases or licensing.
  • Investment protection by potentially extending the life of the initial purchase.

Scaling out a vSAN cluster with additional nodes is a legitimate option when there is a need for additional compute and storage resources. Adding nodes to a vSAN cluster is quite common, and easy to do. Yet, as demonstrated above, vSAN also offers tremendous flexibility in the event that you only need to scale up your storage resources. This is yet another example of how vSAN puts the power of choice back in the hands of the end user.