This week I was talking to a customer and they were wondering if it would be possible to both stretch VSAN across metro distance and at the same time replicate VMs to a third site, geo distance. I realized it isn’t something we have really spoken about so far, but it actually was part of our announcements. Maybe we did not really emphasize it enough, or visualize it correctly for that matter. As I was recreating some of the diagrams for an upcoming presentation, and I felt it was an interesting conversation and use case, I figured I would share it.
In the situation of this customer their second site which is part of the metro distance stretched VSAN environment is actually only 3KMs out, a real disaster would wipe out both sites at the same time and as floodings are not uncommon in the region they are located it made sense to leverage a 3rd site for DR purposes. The disaster recovery site however in their case is about 250KMs away, which makes a stretched environment more challenging (latency) and as such a different solution is needed. In this environment vSphere Replication could be leveraged to replicate from the “stretched VSAN” environment to the DR location. As of vSphere 6.0 Update 1 and Virtual SAN 6.1 you can set it to an RPO of 5 minutes, note that this (for now) has been exclusively certified for Virtual SAN 6.1 and both the source and target is expected to be Virtual SAN.
This environment could be designed and implemented with or without Site Recovery Manager. In the diagram I drew SRM in there as I believe that when a full site failure occurs a mechanism that allows you to automate / orchestrate a failover is truly priceless. Not just automation and orchestration, but also the ability to test a failover scenario without impacting the production workload, which happened to be one of the requirements for this customer.
During the conversation with this customer they had an interesting take on the hardware implementation of this design. There plan is to build a high performing stretched VSAN cluster, 12 hosts on each site and a witness in a small remote location. (12+12+1) In this case they would implement it using 2U servers, with SAS 10K RPM 1.2TB drives and Intel SSDs. They are expecting to run a total of 600 VMs, which is about 25 VMs per host.
Their third site will be more aggressive though. They are only planning on having 6 hosts, which means 75 VMs per host. For those hosts they will be leveraging Intel SSD again, but planning on using 4TB drives instead of the 1.2TB drives, which are NL-SAS. Key reason for it being is lower cost, and the fact that they will not need the same performance during a full dual site outage. On top of that, not all VMs will need to be restarted immediately after a failure and the VMs which are actively running on the DR site can be powered off.
Personally I love having these types of conversations, it truly shows the flexibility that Virtual SAN offers.