VMware Cloud Disaster Recovery

Preparing for Recovery, Getting Started with VMware Cloud DR – Part 3

Preparing for Recovery from Disasters

Once you have the VMware Cloud Disaster Recovery product deployed and the site protected, it’s a simple process to begin preparing for recovering from potential disasters.

In this post we will look at the steps needed to get ready for DR. This includes deploying and configuring the SDDC, creating a basic DR plan, checking and testing that plan and getting one step closer to being ready for dealing with an actual disaster event. These steps are documented in more detail in the product documentation link at the end of this post.

Deploy the SDDC

There are a couple of things to consider when deploying the SDDC. The first thing to think about is how big does it need to be and how long you want to keep the SDDC in service. This is one of the key aspects of the VMware Cloud Disaster Recovery solution aimed at reducing overall operational costs. The SDDC can be as small as a 1-host, temporary 30-day option all the way to a multi-cluster, multi-host full scale deployment.

Deploy the Recovery SDDC for DR from VCDR - VMware Cloud Disaster Recovery
Deploy SDDC from VMware Cloud Disaster Recovery UI

With VMware Cloud Disaster Recovery, you have the ability to provision the SDDC just-in-time, in preparation for either testing or an actual emergency. If the SDDC is deployed just-in-time, you can save on cloud compute costs when not dealing with a disaster, but you will have a normal delay of a couple of hours in recovery time until the SDDC is constructed, deployed and usable. Also, keep in mind that the built-in DR plan compliance checking that happens every 30 minutes will not be able to track the DR site if the SDDC does not exist.

You also have the opportunity to have your SDDC running in an always-available, pilot-light mode of operation. This would be a small footprint of 2-3 hosts of cloud compute that is always available. This will speed up testing and recovery operations and provide a running baseline for continual compliance checking and avoiding risks of operational drift between the production and DR sites. The pilot-light baseline can quickly expand the cloud-based compute host and cluster resources to the desired level when needed to scale up to handle the actual DR event.

In either case, to take advantage of the VMware Cloud DR Live Mount recovery methods and low recovery times, the SDDC must be deployed from the VMware Cloud DR service as shown in the figure above.

Configure the SDDC

Once the SDDC is deployed from VMware Cloud Disaster Recovery, it is important to make any local customizations needed so it will match mappings with respect to any DR plans already defined – as you will see below. For compliance checking, this includes resource groups, folders, networks, and tags. You may need to add other changes to the SDDC such as firewalls, external networks, or other VMC specific changes. While the SDDC is also manageable through the VMC console, we recommend you make these changes through the VMware Cloud Disaster Recovery UI whenever possible as shown in the figure below. If you are managing or modifying the SDDC from the VMC console, refer to the documentation for things you should not change.

Configure the DR SDDC in VMC from VCDR UI - VMware Cloud Disaster Recovery
Configure SDDC with VMware Cloud Disaster Recovery

It is helpful to keep track of the changes you make to the SDDC once its deployed. If there are any discrepancies or site-to-site operational drift changes in the configurations, the continuous compliance checking will pick these up and provide some guidance in the errors as to what may need fixed. For capturing more complicated connectivity and setup of the SDDC within the VMC/AWS environment, there is a VMware fling for exporting some of the configuration of an SDDC so it can be re-applied later that might be useful to your SDDC management tasks.

Create a Basic DR Plan

Now that you have your Protected site configured, recovery points scheduled and running to the Scale-out Cloud File System (SCFS), and the SDDC deployed and configured, it’s time to build the DR plan. Go to the DR plans option in the VMware Cloud Disaster Recovery UI and create a new plan. A couple of considerations will help with easy and accurate DR plan construction.

Create a new DR plan with VCDR - VMware Cloud Disaster Recovery
Create New DR Plan

First, if the SDDC has not been deployed yet, you can still begin creation of the DR plan and fill in the mapping details later. You can see the unselected option in the figure above.

Second, the Protection groups you will pick to get orchestrated by this plan will contain one or more VMs in their inventory. The plan will need a mapping and recovery action defined for every VM in the plan. One caveat to remember for site-to-site mappings is that they must be 1:1 – there isn’t any sharing of the mappings of compute resources, folders or networks when configuration the DR plan details.

The UI will help guide you through some of this setup configuration with its built-in highlighting, but its good practice to have most of the site configurations in place for the mapping steps of plan construction. This may take a couple of iterations to connect all of the site-to-site mappings (e.g., folders, virtual networks, etc.) and the recovery action steps for entire Protection group or individual VMs. We have found it is easier to maintain operational consistency between your on-premises vCenter and the SDDC in VMC on AWS with well-defined mappings for the site-to-site failover.

Next, you should determine the granularity and order of recovery steps and capture that in the DR plan. Individual VMs can be recovered as well as entire Protection groups. The power-on state is also captured in the plan as part of the run-book documentation. As a starter, it may be safer to not power on the VMs under plan control until later iterations of the plan details. If the VMs are not powered on, there is little chance they will conflict with any other running components.

Lastly, to get started its best to iterate over the plan until it performs the right sequence of steps, in the right order and configuration. Save the more detailed customization steps until later once the basics are worked out and defined. This includes custom IP conversions, special scripted actions, separation of failover from test execution mappings (e.g., test bubbles), as well as any special timing or user synchronization details desired in the final plan.

Check the DR Plan

Compliance checks will automatically run every 30 minutes for any active plan. You can also run a compliance check manually and review and resolve any issues as you develop the plans or make changes in the protected or recovery sites. If there are compliance check errors, you can get more information from the report in order to fix the plan.

As an example, suppose you forget to map the virtual networks for the VMs in your plan. The compliance check will not pass as seen by the status shown below.

VCDR continuous compliance checking fails
Continuous Compliance Checking

If you show the results of the check, you can see that there is a missing virtual network mapping from the originating protected site as seen in the figure below.

VCDR continuous compliance details
Continuous Compliance Details

Test Your Plan

Once the plan has passed the compliance checks, you are now almost ready to run a DR plan test. Before the plan can be executed, at least one recovery point of each Protection group defined in the plan must be present in the Scale-out Cloud File System. If the Protection group policy schedule has not run yet, or the initial protected copy has not completely transferred to the cloud, you may see an error such as this when you try to run a test plan execution.

protection group recovery point not replicated
Recovery Point Not Ready

You can monitor the Protection group status in the Monitor -> Protection view or even in the Running Tasks or Recently Finished Tasks area of the UI dashboard.

When testing a DR plan, it is usually recommended to avoid the background Storage vMotion of the VMs at the end of the test run and leave them running on the Live Mount of the SCFS. This should be fine for most initial testing activities and the full storage migration can be tested later once the basic DR plan is ready. Leaving the VMs on the SCFS also reduces test cycle time and minimizes use of cloud resources. This is one of the Runtime settings when launching the DR plan test execution run as shown below.

live mount scfs scaleout cloud file system
Faster Testing Cycle Times

Monitor and Report on DR Readiness

Now that you have:

  • The SDDC deployed for the initial configuration and testing
  • Replicated at least the first recovery point to the VMware Cloud DR system
  • Constructed the initial DR plan
  • Checked plan compliance
  • Run an initial plan test execution

you can now download the run-book that gets created with each test or actual plan execution. Check out the Reports menu choice on the detail view for the DR plan of interest. Select the report from the list and download the PDF for review.

You are now ready for recovery operations based on the details in the DR plan run-book report. Use this as the basis for an iterative approach to DR plan refinement, automated checking and frequent testing with your VMware Cloud Disaster Recovery solution.

Useful Resource Links

Getting Started – Part2 – previous