VMware Cloud Foundation

VMware Cloud Foundation on Dell EMC VxRail TM Blog Series – Part 3: Stretched Cluster

Authors: 

  • Amitrajit Chatterjee, Staff Engineer 
  • Peter Brown, Director R&D 
  • Tony Huynh, Director Product Management 

Now that we have talked about the various features, deployment and management procedures in part 2 of this blog series, let’s focus on some of the Disaster Recovery aspects. vSAN Stretched clusters allow stretching of a single data site to two sites for a higher level of availability and inter-site load balancing. As there are maximum latency requirements these are typically deployed in data centers separated by limited distances like metropolitan or campus environments.  

Stretched clusters can be used for either planned maintenance of the primary site or handle disaster recovery scenarios since one site will always be up and running. vSphere HA will automatically restart any VM on the remaining active site if a problem occurs on the main site. 

For Cloud Foundation on VxRail systems, it is mandatory to stretch the Management Domain in order to stretch any of the Workload Domain clusters. This ensures all infrastructure VMs including the vCenters for the Workload Domains are available on the secondary site in case of a failure of the primary site. Each site is termed an Availability Zone. The recommended minimum number of hosts in each availability zone is 4 hosts and the maximum is 15 hosts. If you are expanding a cluster, you must add hosts in pairs – one host to each Availability zone. Each host in the pair must have the same CPU, memory, and storage. 

A VxRail environment only supports L2 connectivity between sites for stretching. From either site to the Witness Host environment L3 is preferred. A unique witness host will be required for each cluster which needs to be stretched.  

In the current release there is no Graphical User Interface (GUI) option for stretching hence command line tool need to be used. This will be enhanced in future releases. Upcoming features will also include the ability to un-stretch a stretched cluster.  

There are two phases for stretched cluster – Preparation and Stretching, described in more details below. 

 

Preparation  

This prepares the primary availability zone for stretching. This will need to be executed for each cluster which will be stretched. The required command is   

./sos –prepare-stretch –sc-domain <domain_name> –sc-cluster <cluster_name> 

The task details are 

  1. Creates host group in vCenter with hosts in the primary Availability Zone. 
  2. Creates VM group in vCenter with all the infrastructure vm’s running on hosts in primary Availability Zone. 
  3. Creates VM-Host affinity rule in vCenter 

Sample output:-

# ./sos –prepare-stretch –sc-domain MGMT  –sc-cluster VxRail-Virtual-SAN-Cluster-9ccefba1 

Welcome to Supportability and Serviceability(SoS) utility! 

Logs : /var/log/vmware/vcf/sddc-support/stretchCluster-2019-02-19-06-31-17-60257 

Stretch Cluster operation log : /var/log/vmware/vcf/sddc-support/stretchCluster-2019-02-19-06-31-17-60257/sos.log 

Starting vSAN stretched cluster operations.. 

vSAN prepare stretch operation started.. 

  

Api Response:{ 

  “taskId”: “dcd28a70-8dcf-437c-a476-4fcaac278822”, 

  “resourceId”: “6e9d20bd-87a1-4575-a65b-ae0e2310c253”, 

  “resourceType”: “ESXI”, 

  “state”: “IN_PROGRESS”, 

  “description”: “Extends VxRail vSAN cluster from a single data site to two sites”, 

  “timestamp”: 1550557877854, 

  “id”: “dcd28a70-8dcf-437c-a476-4fcaac278822” 

Once the workflow is triggered, task status can be monitored from the SDDC Manager.  

 

Stretching 

After preparation is complete, the clusters are then ready to be stretched. This also involves several tasks: 

  1. Deploy a Witness Host outside of the Cloud Foundation on VxRail system. Follow the vSAN guide for deployment steps. 
  2. Add the Witness Host to the appropriate domain vCenter 
  3. Prepare the hosts which will be part of the secondary Availability ZoneThey will need to be imaged following standard VxRail procedure, ensured they are on the same L2 network as primary Availability Zone and then powered on.  
  4. Complete VxRail cluster expansion from the VxRail Manager plugin on the appropriate domain vCenter. 

The command for stretching is:
 

./sos –stretch-vsan –sc-domain <domainName> –sc-cluster <clusterName> –sc-hosts <hostNames> –witness-host-fqdn <witness host/appliance IP or fqdn> –witness-vsan-ip <IPaddress> –witness-vsan-cidr <IPaddressWithMask> –vsan-gateway-ip <vsan-gateway-ip-address>  

The task details are: 

  1. Creates secondary Availability Zone host group in vCenter with hosts mentioned in 
    –sc-hosts <hostNames>. 
  2. To ensure reachability from hosts in both Availability Zone to the witness, static route is configured on all the ESXi hosts.  
  3. Creates an empty VM group in vCenter in case customer wants to associate some with the secondary Availability Zone.  
  4. Deploys NSX-V vibs onto the newly added hosts. 
  5. Re-configures NSX controller by applying the new storage policy (based on PFTT and SFTT) to the newly added hosts. 

Sample output:- 

# ./sos –stretch-vsan –sc-domain MGMT  –sc-cluster VxRail-Virtual-SAN-Cluster-9ccefba1 –sc-hosts h005.rainpole.local,h006.rainpole.local,h007.rainpole.local  –witness-fqdn  172.18.93.242  –witness-vsan-ip  172.18.94.24  –witness-vsan-cidr 172.18.94.0/22 

Welcome to Supportability and Serviceability(SoS) utility! 

Logs : /var/log/vmware/vcf/sddc-support/stretchCluster-2019-02-18-09-26-30-15557 

Stretch Cluster operation log : /var/log/vmware/vcf/sddc-support/stretchCluster-2019-02-18-09-26-30-15557/sos.log 

Starting vSAN stretched cluster operations.. 

vSAN stretch operation started  

[**IMPORTANT**] Please make sure passwords are correct for each esxi host!! 

* Please provide root user password for h005.rainpole.local :  

* Please provide root user password for h006.rainpole.local :  

* Please provide root user password for h007.rainpole.local :  

Api Response:{ 

  “taskId”: “24cea75b-9fd1-4df7-ab65-be44e1284e59”, 

  “resourceId”: “d16b6678-f346-4046-b757-c38dd1f97b72”, 

  “resourceType”: “ESXI”, 

  “state”: “IN_PROGRESS”, 

  “description”: “Extends VxRail vSAN cluster from a single data site to two sites”, 

  “timestamp”: 1550482008661, 

  “id”: “24cea75b-9fd1-4df7-ab65-be44e1284e59” 

} 

Once the workflow is triggered, user will be prompted to provide root user password for each of the ESXi hosts. The overall task status can be monitored from the SDDC Manager.  

For more detailed steps and other related workflows follow the VMware Cloud Foundation on Dell EMC VxRail Administration Guide.