Site Recovery Manager

How fast can you recover 1000 VMs with SRM?

1kVMs_RP

To answer the question in the title, very fast. Here is a 2 minute video I recently recorded showing SRM recovering 1000 VMs in less than 26 minutes. This was done with SRM 6.5, vSphere Replication 6.5 and vSAN 6.5. It isn’t so much to show what results you will achieve, rather what vSphere, vSAN, vCenter, SRM and VR are capable of.

 

Details as follows:

Hosts:

  • 4 hosts at recovery site
  • 2 socket 10 core processors – 256GB Memory each
  • vSAN 6.5 – All Flash
  • 2 disk groups per host (1 cache/5 capacity disks each)

VMs:

  • 1000 Debian Linux VMs with VMtools installed
  • 128 MB Memory/1 GB HD
  • Replicated with vSphere Replication – RPO 1 hour

Recovery Plan:

  • 10 Protection Groups – 100 VMs each
  • 200 VMs in each of the 5 Priority Groups (all VMs in a Priority Group must start before the VMs in the next Priority Group are started)
  • No IP customization

Comments

4 comments have been added so far

  1. Lets be fair. Most I.T. shops are not running all flash setups or tiny 128MB servers.
    What would a similar test with a mixed storage VSAN with guests from 4GB-16GB look like?

  2. True. Also note that a customer wouldn’t run/recover 1000 VMs on 4 hosts. More people are running all-flash than you think and with the reduced cost of flash and enhanced data efficiency options that will likely continue to grow. Also, anyone looking for the lowest RTO, all-flash is a great way to help with that as often a recovery is basically a bootstorm which is something AF can deal with very well.

    Regarding what a similar test would look like with larger guests, that depends on a lot of things. How long does it take to boot the VMs? What do the dependencies and priority groups look like? Are IP addresses being customized (this entails an extra reboot)? For a little bit of a comparison, check out this video showing recovering 100 windows 2012 VMs in about 11 mins https://storagehub.vmware.com/#!/site-recovery-manager-3/site-recovery-manager-demonstrations/running-a-recovery-plan. These VMs have 4 or 8 GB of Memory each and are 100GB on disk each.

  3. Another point is.. there’s no IP customization. That’s very easy even with non-flash disk. I already accomplished with SRM 5.8 also with VMS with more than 10GB RAM, windows/linux. Let’s do some complexity here! hhehehe

  4. Recovering VMs with final power state off and no IP customization greatly reduces recovery time. However if IP customization is enabled there is no easy way, that I can find, to temporarily set mode to No IP Customization.
    Use case is to perform SRM function testing with minimal impact on recovery clusters (running heavy QA workloads) in shortest time. That’s about 1700 VMs we would have to reconfigure to No IP Customization one at a time.
    The next time we import our IP customization .csv file they are all set back to manual mode with static Recovery Site IPs. Does anyone know how to automate configuring IP customization mode from Manual to No IP Customization?

    Here’s a trick to speed up recovery when using IP customization. By enabling memory reservations, all locked, on the placeholder VMs the swap file is not created. This reduces IO on the recovery datastores when the VMs power on for IP Customization. Of course you have to have the usable memory available on the hosts. You can also create DRS rules to govern the placeholder VMs. We use both methods on monster VM clusters.

Leave a Reply

Your email address will not be published.