Customers vSAN

Stretch Across Metro, Replicate Across Geo… Go VSAN!


This week I was talking to a customer and they were wondering if it would be possible to both stretch VSAN across metro distance and at the same time replicate VMs to a third site, geo distance. I realized it isn’t something we have really spoken about so far, but it actually was part of our announcements. Maybe we did not really emphasize it enough, or visualize it correctly for that matter. As I was recreating some of the diagrams for an upcoming presentation, and I felt it was an interesting conversation and use case, I figured I would share it.

Screen Shot 2015-09-16 at 14.30.07

In the situation of this customer their second site which is part of the metro distance stretched VSAN environment is actually only 3KMs out, a real disaster would wipe out both sites at the same time and as floodings are not uncommon in the region they are located it made sense to leverage a 3rd site for DR purposes. The disaster recovery site however in their case is about 250KMs away, which makes a stretched environment more challenging (latency) and as such a different solution is needed. In this environment vSphere Replication could be leveraged to replicate from the “stretched VSAN” environment to the DR location. As of vSphere 6.0 Update 1 and Virtual SAN 6.1 you can set it to an RPO of 5 minutes, note that this (for now) has been exclusively certified for Virtual SAN 6.1 and both the source and target is expected to be Virtual SAN.

This environment could be designed and implemented with or without Site Recovery Manager. In the diagram I drew SRM in there as I believe that when a full site failure occurs a mechanism that allows you to automate / orchestrate a failover is truly priceless. Not just automation and orchestration, but also the ability to test a failover scenario without impacting the production workload, which happened to be one of the requirements for this customer.

During the conversation with this customer they had an interesting take on the hardware implementation of this design. There plan is to build a high performing stretched VSAN cluster, 12 hosts on each site and a witness in a small remote location. (12+12+1) In this case they would implement it using 2U servers, with SAS 10K RPM 1.2TB drives and Intel SSDs. They are expecting to run a total of 600 VMs, which is about 25 VMs per host.

Their third site will be more aggressive though. They are only planning on having 6 hosts, which means 75 VMs per host. For those hosts they will be leveraging Intel SSD again, but planning on using 4TB drives instead of the 1.2TB drives, which are NL-SAS. Key reason for it being is lower cost, and the fact that they will not need the same performance during a full dual site outage. On top of that, not all VMs will need to be restarted immediately after a failure and the VMs which are actively running on the DR site can be powered off.

Personally I love having these types of conversations, it truly shows the flexibility that Virtual SAN offers.



2 comments have been added so far

  1. This is very interesting scenario from the Business Continuity and Disaster Recovery perspective and really highlights how robust and solid are actually the solutions offered by VMware.
    My only question, is related with the default Storage Policy of the replicas on the DR Site. What is the default Storage policy used by the replicas ? The VSAN Default one ? I don’t see benefits of sacrificing VSAN Space on the DR Site to hold the Replicas of the Main Site if the default SP is FT1.
    How is this Integrated with VSAN + vSphere Replication ?
    I would see a real benefit, using FT0 during the replications and should a Failover occur, then automated the SP change from FT0 to FT1 on the DR sites for the VM’s that have been recovered there. This would add up extra level of protection for the Failed over VM’s.



  2. By default “FTT=1” is used Nelson. I think it is a good suggestion, I recommend filing a feature request. You know where 🙂

Leave a Reply

Your email address will not be published.