Technical

vCenter High Availability Deep Dive – Part 2

The first post in this series covered vCenter High Availability deployment types and considerations. Hopefully, by now you’ve gotten your VCHA cluster deployed and are ready to move on with some of the operational aspects such as backup & restore, patching, and upgrades to the VCHA cluster. The continuation of this blog series will dive into each feature and provide guidance.

Backup

Starting with vSphere 6.5 the vCenter Server Appliance Virtual Appliance Management Interface (VAMI) now has integrated native file-based backup built in. With a VCHA deployment it can be complicated to know which is the active node and properly back that up, using the built in file-based backup you will always be backing up the active node with the latest data through the VAMI.

The file-based backup utility supports backup of the vCenter Server Appliance or Platform Services Controller and is supported for both embedded or external PSC deployments. To start the backup workflow, log into the VAMI of the VCSA or PSC and choose the backup button in the bottom left corner (6.7) or select the backup button on the summary screen (6.5) to launch the backup workflow.

When file-based backups are done through the VAMI a few backup targets are available, FTP, FTPS, HTTP, HTTPS and SCP. Once your backup target is configured we can begin to configure our backup details.

By default, only the inventory and configuration of the vCenter Server are backed up. You do have the option to select backup of stats, events and tasks but keep in mind this option will increase the backup size as well as the time of the actual backup itself. When performing a file-based backup you also have the option to encrypt the backup data which uses AES 256 encryption. Once the backup is complete, all the files for restore will be available on the backup target.

 

A vCenter Server Appliance File-Based Backup Walkthrough (6.5) is available here.

 

Restore

As we discussed in part 1 you need a minimum of two nodes for a VCHA cluster to be online and functioning.  So, if you are performing a restore how does this work? Before we start to restore, we will need to power off and delete any existing cluster nodes. Once we begin our restore of a VCHA vCenter, only the primary node is restored as a standalone vCenter Server Appliance. This means, once your restore is complete, you will need to deploy your VCHA Cluster again.

If you have a failure of an external platform services controller (PSC) and no other PSC is available in the SSO Domain, your only option is to restore from backup. However, since we are talking about VCHA, at least 2 PSC’s behind a load balancer are required, so its highly unlikely you would lose both at the same time.

To begin your restore, you will need the ISO of the vCenter Server version you were on, as well as the backup target information such as the location, username, password and path. If you chose to encrypt your backup, you will also need to know the password used for encryption. Without this password, the restore of an appliance will be unsuccessful.

When the vCenter Server is restored it retains the same UUID and FQDN as the original, so no additional configurations are needed.

A vCenter Server Appliance File-Based Restore Walkthrough is available here.

Patching

When patching a VCHA Cluster there are really two ways. Depending on processes and maintenance windows you can choose which way you want to perform the update.

If using a VCHA Basic Deployment, it can be less of a total time commitment and easier to destroy the VCHA configuration. Once the cluster configuration is destroyed you can then patch the vCenter Server through the VAMI before then re-deploying VCHA.

However if you need to minimize downtime, or have a VCHA Advanced deployment, the steps to patch a cluster are outlined below. You will only have ~5 minute outage during the failover process if you go through this method.

The first step to download the patch is to not go to my.vmware.com. You need to navigate to the VMware Patch Download Center and select VC from the Search by Product drop down, and then vSphere 6.5 or vSphere 6.7. You will then see the option to download the VMware-vCenter-Server-Appliance-6.x.xxxx-patch-FP.iso

Now that we have all the patch file we need, we can proceed with the actual upgrade.

  • Log into the vSphere Web Client and put vCenter HA into Maintenance Mode.
    • Maintenance mode means replication still occurs but automatic failover is disabled.

From here you can follow the official documentation to Patch a vCenter High Availability Environment. The high level process is that you patch the witness, then the passive node, perform a failover, and then patch the new passive node.

Which method you choose is up to you, but it is good to know there are multiple options available.

One thing to note, after patching is complete we will be now running on the previous passive node. Depending on your VCHA deployment topology it is your discretion if you wish to fail back to the original node after patching is complete.

Upgrade

When upgrading a VCHA cluster from vSphere 6.5 to 6.7, VCHA must be destroyed. Unlike with patching where you can update each node independently, you are unable to do this when doing an upgrade.

If VCHA was deployed using the Basic workflow, destroying the cluster is easy and automated, it will automatically shutdown and delete the passive and witness nodes. If VCHA was deployed using the advanced workflow, you will need to remove the VCHA configuration and then manually shutdown and delete the passive and witness nodes.

Once your vCenter is upgraded, you can then proceed to deploy your VCHA cluster again.

Conclusion

vCenter High Availability is a high availability solution and not a disaster recovery solution. This is evidenced by the fact that VCHA only protects vCenter Server and not the workloads or hosts being managed by it. When deploying VCHA try to use the Basic workflow when at all possible to make the solution easier to maintain and deploy. Protect vCenter Server within a site as you are much more likely to suffer a failure from hardware, network, or storage than a total site failure.