vSAN

PM Hub: Better Site Protection with Stretched Clusters

Introducing PM Hub: Your one stop show for all things Storage Product Management related

Today, we the Storage and Availability Product Management team, are proud to announce the launch of a new blog series called PM Hub. With this series, we aim to bring you a Product Manager’s perspective of features available in vSAN, vVols, SRM, etc. where we will discuss the use-case, benefits and a lot more.

To kick off this series we will be walking you through several features that have been launched with the newest release of vSAN I.e. vSAN 6.6

VSAN 6.6 is here!

With vSAN6.6 we have aimed to ensure that customers will be able to effortlessly

  • Lower TCO: By moving features towards software and away from hardware, Customers will see a significant reduction in their overall TCO
  • Evolve without Risk: With new security and availability features, customers will be able to expand their HCI journey without risk
  • Scale to Tomorrow: With a 50% improvement in performance for All-Flash vSAN as well as support for next gen hardware like NVMe, customers can scale to tomorrow

Picture1

 

In keeping with these three key themes for vSAN6.6, we will be publishing a number of blogs which will take you through a detailed walk-through of features such as Encryption, Nested Fault Domains, Re-sync, Uni-cast, etc.

So, let us begin.

In part 1 of the PM Hub series we will focus on two features in the first theme I.e. Lower TCO with Local Failure Protection and Site Affinity.

Local Failure Protection

What was the problem that we wanted to address?

In previous versions of Stretched clusters for vSAN, we had RAID-1 configuration for objects across sites. This meant that in the event of a site failure, there was only a single copy of the object available to use for rebuilding data while operating with reduced redundancy. Furthermore, in case of an additional failure, there could be data loss.

Picture2

 

How have we addressed this in vSAN 6.6?

With Local Failure Protection, by using policies customers can configure storage redundancy within and across sites. This way, you can create local and distributed copies with RAID-1(for Hybrid) as well as RAID 1/5/6 (All-Flash.)

These copies can be created by using two new policies I.e. PFTT and SFTT. PFTT is the policy used to define Primary Level Protection. This is implemented as RAID-1 to allow mirroring of objects across sites. SFTT is the policy used to define Secondary Level Protection Aka Local Site protection. This is RAID-1 for Hybrid and can be RAID1/RAID5/RAID/6 for All-Flash.

Pictur3

 

Consider the following scenario: For a stretched cluster with PFTT =1, SFTT = 1 and Fault Tolerance Method = RAID 5, when the Primary Site is shut down and a host is shut down on the secondary site, not only will the VMs be accessible but data will also be accessible on the secondary site. These data objects can be used later to initiate a data rebuild.

What are the benefits of using Local Failure Protection?

  1. Host and disk group protection when site failure happens: In the event of a host or disk group failure, data does not have to be fetched from the alternative site resulting in better performance with better protection
  1. Local replication when component has failed: By using local replicas of the object data can be effortlessly rebuilt in the event of a host outage

A few things to keep in mind when using Local Failure protection

  • Each site should have the minimum required number of hosts to meet the requirements of the local policy. In the example above, a minimum of 4 hosts per site is required
  • SFTT is only becomes available (viewable in vCenter) when Stretched clusters is enabled
  • Users can specify which VMs need to be protected across sites and which should be protected locally

Site Affinity

What is the use case for Site Affinity?

There are several workloads in a datacenter with inbuilt application level availability or redundancy. Eg: Microsoft SQL Server AlwaysOn. However, typical production workloads require multi-site protection to enable better data redundancy. How do we cater to workloads which do not require copies to be stored on different sites?

Picture4

Introducing Local Affinity

With Local Affinity, customers can use policies to keep data on a single site. In this case FTT = 0. This ensures that objects are not replicated to the secondary site thereby reducing the bandwidth required between sites. Additionally, by using Affinity rules, customers can set VM/VMDK assignments to specific hosts.

Picture6

For example, to test local affinity, you can set PFTT = 0, SFTT = 2, FTM = RAID 5. The outcome of this test is that all IOs should be done locally and not on the secondary site. This way, customers can seamlessly achieve host/disk protection for objects that do not require site protection.

A few housekeeping rules for local affinity:

  1. Affinity will only be available when Stretched Clusters is enabled
  2. DRS/HA rules should be aligned with Data Locality
  3. RAID0/RAID 1 are supported for Hybrid and RAID0/RAID1/RAID5/RAID/6 are supported for All Flash

Conclusion:

Local Failure Protection and Site Affinity in vSAN6.6 reduces the design complexity and improves the performance of stretched clusters in a data center, thereby delivering a significant TCO reduction for Customers. As long as you enable Stretched Clusters with vSAN, you will be able to easily use these two new features.

To learn about vSAN, visit VMware vSAN,  Virtual Blocks and StorageHub

In the meantime, stay tuned for more from PM Hub.