The VMware Cloud Foundation Instance Recovery solution provides guidance on recovering a VMware Cloud Foundation (VCF) instance from absolute zero to a fully recovered environment. The process provides detailed instructions on recovering an entire VCF instance, including the management domain and VI workload domains, where you must recover all components.
The guide provides step-by-step manual instructions to recover your VMware Cloud Foundation instance as well as comprehensive automation in the form of a PowerShell module to expedite and remove the complexity of manual recovery by leveraging data within the VCF SDDC Manager inventory to reconstruct configurations. This alleviates the need to refer to as-built documentation which can very easily become stale over time with the ever expanding and contracting nature of a complex Software Defined Datacenter.
Use Cases
Examples of scenarios where you might need to use this process are:
- Complete site failure
- Recovery from a malware or ransomware attack
- Catastrophic logical corruption
This is particularly important for industries that must meet regulatory requirements (such as the Digital Operational Resilience Act (DORA) in the European Union).
A little about DORA
- DORA is a European Union (EU) regulation that entered into force on 16 January 2023 and that created a binding, comprehensive information and communication technology (ICT) risk management framework for the EU financial sector.
- DORA establishes technical standards that financial entities and their critical third-party technology service providers must implement in their ICT systems by January 17, 2025.
- Entities will also need to establish business continuity and disaster recovery plans for various cyber risk scenarios, such as ICT service failures, natural disasters, and cyber attacks. These plans must include data backup and recovery measures, and system restoration processes.
While DORA is a European regulation, its reach extends to businesses that operate in the EU regardless of where their headquarters reside. Most importantly, DORA is an example of a type of regulation that is going to become more commonplace over the next couple of years.
Recovering a VCF Instance is not just a Paper Exercise
Regulations put non-trivial responsibilities on businesses such as financial services firms and related third-party tech suppliers to have robust plans to respond to failures of their systems.
Enterprises will need to conduct periodic testing of their plans, tools and systems to demonstrate their ability to recover business critical infrastructure from system failures in a timely and repeatable manner.
Solution Summary
The VMware Cloud Foundation Instance Recovery solution leverages a combination of restore, recovery and rebuild processes to re-instantiate a VCF Instance to exactly the same configuration even if the underlying hardware and datacenter it resided in has been lost.
High Level Steps:
- Rebuild VMware vSphere hosts using the same or new hardware, informed by data extracted from the backup of your VCF SDDC Manager inventory
- Perform a partial deployment of VCF
- Restore VMware vCenter and NSX Manager Instances as well as SDDC Manager
- Reconstruct vSphere clusters including their network configurations and settings
- Recover NSX Edges
- Restore Workloads
- Restore Workload settings (DRS groups, vSphere tags, and inventory locations)
Recovery Timeline for the VMware Cloud Foundation Instance
To minimize the time of overall recovery in VMware Cloud Foundation, recovery tasks can be performed across multiple workload domains by following an overlapping timeline, adapted for your setup. The timeline is for the following example setup:
- 3 x VI workload domains.
- VI Workload Domain 1 and VI Workload Domain 2 are in the same vCenter Single Sign-On domain as the management domain. They are in Enhanced Link Mode (ELM).
- VMware Cloud Foundation 5.x only. VI Workload Domain 3 is in an isolated vCenter Single Sign-On (SSO) domain.
- The restore pattern for VI workload domain in the same SSO domain can be extended if more VI workload domains are connected to the management vCenter Single Sign-On domain.
Powershell Automation.
The automation comes in a PowerShell module named VMware.CloudFoundation.InstanceRecovery which is a comprehensive set of cmdlets that removed the tedious and error prone nature of reconstructing what may be a potentially complex and sizeable Software Defined Datacenter.
This is particular useful where tasks are done repetitively such as per ESXi host or per recovered Virtual Machine.
The process relies on its ability to extract data from the SDDC manager backup you intend to restore from. This means the automation can restore to the latest viable backup without having to rely on manual documentation being kept up to date.
Example of Extracting Configuration Data from SDDC Manager Backup for use in Recovery
Once extracted, every step of the process leverages this data to guide and automate the reconstruction.
In lab environments full VCF instances including Management Domain and VI Workload Domains were recovered in as little as two hours. Many of the tasks for additional workload domains can be done in parallel or in overlapping fashion to minimize the overall instance recovery time.
It’s already been tested in a lab environment by one of VCFs largest customers and they are very excited about what it offers them in terms of meeting their regulatory requirements.
We have plans to further extend the automation and processes to support additional topologies, configurations and technologies….so watch this space!!
For more information, the guide is available here https://docs.vmware.com/en/VMware-Cloud-Foundation/services/vcf-recovery/GUID-ACBFFD0F-F8CA-47D9-990B-7B0975BCDF5A.html
About the Authors
Ken Gould is a 30 year veteran of the technology industry across EMC/Dell/VMware. He currently works as a Staff II Solution Architect in the VMware Cloud Foundation Division team at Broadcom. Before that he was lead architect on Enterprise Hybrid Cloud (EHC) at Dell as well as VVS and VVD at VMware. He writes a lot of PowerShell Automation 🙂
Brian is a Staff II Solution Architect within the VMware Cloud Foundation Division at Broadcom. He has worked on VMware Validated Designs (VVD), VMware Validated Solutions (VVS) & now VMware Cloud Foundation (VCF).