A brand new white paper was just published. This white paper was written by Lee Dilworth, Ken Werneburg, Frank Denneman, Stuart Hardman and I. It is a white paper on vSphere Metro Storage Cluster solutions and specifically looks at things from a VMware perspective. Enjoy!
- VMware vSphere Metro Storage Cluster (VMware vMSC) is a new configuration within the VMware Hardware Compatibility List. This type of configuration is commonly referred to as a stretched storage cluster or metro storage cluster. It is implemented in environments where disaster/downtime avoidance is a key requirement. This case study was developed to provide additional insight and information regarding operation of a VMware vMSC infrastructure in conjunction with VMware vSphere. This paper will explain how vSphere handles specific failure scenarios and will discuss various design considerations and operational procedures.http://www.vmware.com/resources/techresources/10299
I am aiming to also have the kindle/epub version up soon and will let you know when they are released!
Ronny
Hi,
I’ve read the vMSC whitepaper and it’s excellent!
As I understand you’re using iSCSI in this configuration. On page 20, it says that isolation response was configured to “leave powered on”. But on page 21, you suggest to change the isolation response to “power off” when using IP storage (iSCSI, NFS).
Is there a specific reason that you didn’t change the isolation response to “power off” since you’re using iSCSI?
thanks for clarification.
regards
Duncan
Yes we left it as “leave powered on” to test the behavior. In general though when using IP based storage this is not what you want to do. Thanks for your comment and for taking the time to read it this thoroughly!
iwan 'e1' Rahabok
Thanks for a great technical paper. Very useful as I’m doing a stretched cluster now, using the Uniform Config.
With the uniform configuration, the limitation is it is not a _true_ write on both datastores (LUN). Read/write access to a LUN takes place on just 1 of the two arrays, and a synchronous mirror of the LUN is maintained in a hidden, read-only state on the second array. That means all VMs on 1 datastore must be on 1 LUN. This is ok, except when we have HA event.
Say I have 4 hosts per DC, and each site runs 60 VM equally. So we have 120 VM on 8 hosts. If 2 hosts fails on DC 1, I’d prefer 120 VM be distributed on 6 hosts. But what will happen here is 60 VM of Site 1 will run on 2 hosts.
I think here is what might happen on the above:
HA will kick in first and boot the affected VMs. So some VMs might end up on Site 2.
DRS then kicks in to fix the VM-Host affinity. So the Site 1 VMs will be vMotion back to Site 1.
If performance change, DRS might decide to overwrite the “should rule” and migrate some Site 1 VMs to Site 2. This can be a different VMs.
Another concern is total site failure. HA, unlike SRM, does not have rich dependancy. Yes, the VMs might boot, but the services might not be functioning properly. I think vApp can help here, but I’m not certain if our best practice is to recommend a lot of vApps. Also, VM-Host rules does not understand vApp.
It seems to me, a design best practice would either be:
– Have 2-node cluster. No DRS. This is because we’re running at 50%, so disabling DRS makes no practical impact.
– Have a large cluster. Say 8 node per Site. This gives each Site enough hosts to handle unavailable host (be it planned or unplanned).
– Use Stretched Cluster for Tier 0 Apps, and SRM for Tier 1 Apps.
Do correct me if I’m wrong. And thanks from Singapore.
e1
Rick
Hey Duncan,
unfortunately, the page leading to the document “no longer exists”.
Can you please ccheck the link and may be update your posting? 🙂
Thanks heaps,
Rick
HamR
The document link is still broken.
Stacy Carter
Hi Duncan – another vMSC question/comment 🙂 Getting complaints about deployment taking a long time on vMSC compared to non-stretched clusters. I believe this is because DRS (fully automated) keeps putting the new VMs on hosts at the opposite site until the VM is added to the affinity group, and can’t add it to the group until after deployment. Would be nice if there was some sort of site awareness added to the deployment wizard so that we could choose a site when deploying (unless I’m missing something).
Ramy Mahmoud
Direct Link to the WP http://blogs.vmware.com/vsphere/files/2015/06/VMW-TMD-vSphr-Mtro-Strge-Clster-USLET-1.2.pdf