Home > Blogs > Virtualize Business Critical Applications

Say Hello to vMotion-compatible Shared-Disks Windows Clustering on vSphere

As you dive into the inner-workings of the new version of VMware vSphere (aka ESXi), one of the gems you will discover to your delight is the enhanced virtual machine portability feature that allows you to vMotion a running pair of clustered Windows workloads that have been configured with shared disks.

I pause here now to let you complete the obligatory jiggy dance. No? You have no idea what I just talked about up there, do you? Let me break it down for you:
In vSphere 6.0, you can configure two or more VMs running Windows Server Failover Clustering (or MSCS for older Windows OSes), using common, shared virtual disks (RDM) among them AND still be able to successfully vMotion any of the clustered nodes without inducing failure in WSFC or the clustered application. What’s the big-deal about that? Well, it is the first time VMware has ever officially supported such configuration without any third-party solution, formal exception, or a number of caveats. Simply put, this is now an official, out-of-the-box feature that does not have any exception or special requirements other than the following:
  • The VMs must be in “Hardware 11” compatibility mode – which means that you are either creating and running the VMs on ESXi 6.0 hosts, or you have converted your old template to Hardware 11 and deployed it on ESXi 6.0
  • The disks must be connected to virtual SCSI controllers that have been configured for “Physical” SCSI Bus Sharing mode
  • And the disk type *MUST* be of the “Raw Device Mapping” type. VMDK disks are *NOT* supported for the configuration described in this document.
What is the value of this new feature?
Concurrent Host and Guest cluster provide a much-improve high availability option for virtualized workloads. It is a configuration that makes virtualization that much more superior to physical configuration and this is something that has been in much demand among our customers for a very long time. With this configuration, customers are able to provide application-level High Availability using the Windows Failover Clustering feature with which most Windows administrators are already familiar while (at the same time) providing machine-level resilience for both the ESXi hosts and the virtual machine using vSphere HA and vMotion.
  • vSphere HA ensures availability in a vSphere cluster by ensuring that, in the event of unplanned outage affecting an ESXi host, the VMs running on that host are automatically powered on on other ESXi hosts in the cluster.
  • In the event of a planned outage, a vSphere administrator can move a VM from an ESXi host while the VM is up and running and its application and services continue to be accessible to the end-user or dependent process/service. This movement of the VM is done using vMotion.
  • vMotion can also be used (manually or automatically) for resource-balancing in a given vSphere cluster through the VMware Dynamic Resource Scheduling (DRS) feature.
These three features satisfy all the requirements for Windows VM portability defined by Microsoft – See “Host-based failover clustering and migration for Exchange“ section of the Exchange 2013 virtualization whitepaper
Configuring Share-disk clustered VMs that support vMotion on vSphere 6.0 is not overly complicated. I shall now proceed to describe the process (I will skip any mention of configuring MSCS or Windows Clustering itself – see our “Setup for Failover Clustering and Microsoft Cluster Service” for a better and more comprehensive description of this process.
Configuring vMotion-compatible Shared Disks VMs (we assume here that the vSphere cluster is operational and contains ESXi 6.0)
  • Verify that the VM’s hardware version is at hardware 11

Hardware 11

  • If it’s not, you must upgrade it to 11

Upgrade Hardware 11

  • Ensure that the VMs are powered off
  • Add a virtual SCSI controller to the VMs

Add SCSI Controller

  • Set the SCSI controller’s bus sharing mode to “Physical”

SCSI Bus Sharing

  • Add a new RDM disk to the first VM

Add RDM Disks

    • (1) It is recommended that you store the mapping file in a location that is centrally and easily accessible to all the ESXi hosts in the vSphere cluster (you never know which host may house the VM at a point in future)
    • (2) Ensure that the new disk is connected to the “Physical Mode” SCSI controller configured in previous steps

Connect to the Correct SCSI Controller

  • Power on and log into the VM. Configure and format the disk in disk manager as desired.
  • Add an “existing disk” to the second VM.

Add Existing Disk

  • Ensure that you are selecting the disk that you added to the first VM (we are sharing disks here, remember?)
  • Ensure that this disk is also connected to the “Physical Mode” SCSI controller configured in previous steps
  • Power on this VM and log into Windows.
  • The disk should be visible in disk manager on both VMs
  • Repeat this process for all other VMs that will be sharing this disk (up to 5 such VMs are supported on vSphere)
That, my friend, is the extent of the configuration steps required to share vMotion-compatible disks among VMs in vSphere 6.0
Now, go ahead and migrate any of the VMs while they are powered on.
There is a catch – or two …. ok, maybe three catches
Yes, you knew this was coming, didn’t you?
  • VMware still does not support Storage vMotion for VMs that are configured with shared disks. If you attempt to perform a Storage vMotion on such workloads, the attempt will fail.
  • While it is technically possible to successfully use a VMDK (instead of an RDM) disk for the configuration described above, please be advised that VMware does not support such configuration. You will be able to vMotion the workloads successfully and things will appear to behave optimally and without a hitch. PLEASE DO NOT DO SO. Such configuration may lead to instability and data corruption. Please see Configuring Microsoft Cluster Service fails with the error: Validate SCSI-3 Persistent Reservation for more information.
  • Insufficient bandwidth WILL hamper your vMotion operation and cause service interruption for your clustered workloads. Wait, are you surprised? How do you suppose we get a running VM from one host to another? Teleportation? No, we copy it over the wire incrementally. We strive to complete the copy and the switch-over very rapidly. IF the vMotion network is congested or insufficient (say, perhaps, you try to vMotion a running “monster VM” with, say 128 CPUs and 4TB of RAM over a 1GB link that is shared with other trafffic), the copy and switch-over operation will take a very long time, long enough for the VM to lose heartbeat with its peer nodes and, consequently, trigger a failover or shutdown of its cluster service for lack of quorum.

To avoid the issue described in the previous paragraph (and to ensure the overall health and functionality of your vMotion operations), VMware recommends the following:

    • Put the vMotion VMKernel Portgroup on a 10GB (or higher) network
    • If you do not have a 10GB or higher network facilities, create more than one vMotion VMKernel Portgroup in the vSphere cluster. Use separate 1GB NIC for each portgroup
    • IF using 1GB NICs, consider enabling jumbo-frames at all levels of the network stack (from physical switches all the way to the in-guest network card)
    • IF none of the recommended options above is possible, consider tuning the cluster services inside the Guest OS to tolerate longer heartbeat timeouts. See Tuning Failover Cluster Network Thresholds for more information and recommended settings.

That’s all. Nothing fanciful or complicated – we took care of the complexities for you, so go ahead and vMotion that shared-disk clustered workload. But don’t forget the RDM.

47 thoughts on “Say Hello to vMotion-compatible Shared-Disks Windows Clustering on vSphere

  1. Elden Christensen

    Great addition! I know this is something I’ve seen customers jumping up and down for. Great recommendations about scaling up intra-host networks to keep the blackout times minimal on VM movement. The challenge is that clustering is health monitoring the servers, and can’t differentiate losing communication for good or bad reasons. I agree, increasing the SameSubnetThreshold is another good mitigation, I would suggest increasing it to 10 or possibly even 20. My blog you linked has all the details.

    Ohh… and we haven’t called it “MSCS” in a decade. It’s Windows Server Failover Clustering (WSFC), it feels like finger nails on a chalkboard when you call my product by the wrong name. 🙂

    Application availability features are critical to many workloads, I love seeing our features coming together!

    Elden Christensen
    Principal Program Manager Lead
    Windows Server Clustering Group

    1. Deji

      Thanks for the comments, Elden. Beaucoup apologies for the “MSCS” misnomer – we share the same grating “finger nails on chalkboard” aversion to that name. Sadly, most of our joint clients still do a “wait…what?” double-take when we say “WSFC”, so we are compelled to say WSFC/MSCS. Do you guys make a memory-wiper or something-such?

    2. King of Bytes

      The problem, Elden, is that most of your customers still do call it MSCS. Might want to talk to your internal marketing team and get the word out to the base.

      1. Deji

        What the King said 😮

        What say you, Elden? Methinks this is a good use case for a Microsoft-branded Men-in-Black Memory Zapper :-p

    1. Deji

      @Tommy – Not for Windows Failover Clsutering, at least not mainstream and not in immediate plans. We already have multi-writer flag capabilities for other uses cases, and we have qualified support for its uses in Windows Clustering under stringent terms.

      1. Matt Larson

        @Deji – How can I find out more information on the qualified uses for the multi-writer flag when used with Windows Clustering? Would a WSFC cluster running SQL 2014 (P-mode RDMs) with Cluster Shared Volumes one of those supported use cases?

        1. Deji Post author

          @Matt, we do not have any official support for multi-writer flag with any Windows Failover Clustering option. Not even a qualified one. We are actively reviewing this, but there is nothing I can offer you at this moment.


  3. Pingback: Newsletter: February 21, 2015 | Notes from MWhite

  4. Jessy Ybarra

    Great news on the vMotion portion of this enhancement!. Does this mean I will finally be able to snapshot VM’s that share a physical RDM? I use VEEAM to backup all my VM’s except my SQL2012 servers participating in an AG cluster. It’s been hell trying to find a backup solution other than local and agent based for these servers.

    1. Deji


      We currently do not have any different plans for snapshotting. Please note that this statement does not extend to third-party vendors’ plans or features.

  5. Jayaraghavendra

    Hey Deji

    The Vmotion was possible from esxi 5.5, but the disks were were getting failed.

    Is this feature completely support without any issue with quorum disks ?

    If that is the case we are really happy, please confirm.


    1. Deji

      Thanks for the question, Jan. vMotion of disk-sharing clustered VMs are currently supported *only* on vSphere 6.0, and with specific configuration requirements. Any other configuration is unsupported and unrecommended.

      Shared-disk vMotion for Windows VMs is *fully* supported in vSphere 6.0 regardless of what you use the shared disks for – quorum or data.

  6. Joe Zemaitis

    Reading the “Mastering vSphere 6” book i received yesterdays states that “vMotion and vSphee dRS are not supported using Microsoft-clustered VMs.” Looks like this topic is getting confused at the highest levels.

    1. Deji

      Hello, Joe. Thanks for bringing this to our attention. The author of that book is incorrect. We will contact them and request an update.

    2. Nick Marshall

      Hi Joe,

      I’m sorry but the statement the book is incorrect. I will be contacting the publisher to arrange an update to the book’s errata.

      As an author it is my responsibility to ensure that things are correct and accurate at the time of publication (along with my co-authors and technical editing team). Unfortunately this one slipped through the cracks.

      I will post an article on my blog too explaining the error to try and help with the confusion.
      Thanks for bringing it to my attention.

      Nick Marshall

  7. Guy Dallaire

    We are currently using Novell OES clusters that use physical RDM and shared SCSI bus (in a similar configuration, on ESXi 5.5) and this poses serious limitations in VM management. We cannot vmotion cluster nodes, we cannot take snapshots, we cannot storage vmotion while the machines are running. We have to shut the nodes off before doing so, and even then, we cannot take snapshots. This is very limitating for HA.

    Do you think some of these problems would be somewhat alleviated when we move on to ESXi 6 ?


    1. Deji Post author

      @Guy, vSphere 6.0 allows you to vMotion the VMs which have been configured with physical mode RDM in bus sharing mode, without having to power down any of the clustered nodes. This is the major limitation addressed in this release of ESXi. Storage vMotion and snapshots are still unsupported.

      1. Matt Larson

        Hi Deji,

        Given the statement – vSphere 6.0 allows you to vMotion the VMs which have been configured with physical mode RDM in bus sharing mode, without having to power down any of the clustered nodes – is it safe to say that this this new capability can be leveraged on Oracle RAC deployments? Was hoping to get confirmation that the Oracle RAC instance will not experience a hickup when leveraging this new capability.

        Most of the discussion around this topic is specific to MSCS/WSFC, so any light you can shed regarding Oracle RAC will be helpful.



        1. Sudhir Balasubramanian

          Yes, Oracle RAC can be used with RDM with VS 6.0 which allows vmotion capability, now the next question is , is is really needed to use RDM for Oracle , especially when we are talking performance ? maybe with earlier versions of vsphere but with the current versions of vsphere , rdm v/s vmfs discussion is now becoming more and more distant with teh advent of VVOL’s

          short answer is yes, oracle rac with rdm on VS6 can leverage vmotion capability

  8. Rich Dowling

    Aside from the ability to vmotion VMs with shared bus pRDM, does this mean that such VMs can co-exist on the same host now? That was always a problem before – they had to be on diffierent hosts, and vRDM had to be on the same host I think.
    I’ve not managed to find this information anywhere else.

    I know this creates an eggs in one basket situation, but when there are only 2 hosts in the farm, this would assist with maintenance tasks.

    1. Deji Post author

      Rich, there are two ways to host Windows clustered VMs on vSphere:

      Put all the Windows VMs on the same ESXi host – this is called “Cluster-in-a-Box” configuration. The requirement is that the shared disks MUST be presented to the VMs are RDMs attached to SCSI controllers that have been configured in VIRTUAL Compatibility Mode

      Put each of the Windows VMs on separate ESXi host – this is called “Cluster-across-Boxes” configuration. The requirement is that the shared disks MUST be presented to the VMs are RDMs attached to SCSI controllers that have been configured in PHYSICAL Compatibility Mode.

      Any other configuration (or mixture) is unsupported.

  9. Pingback: VCP6-DCV Objective 7.5 - Troubleshoot HA and DRS Configurations and Fault Tolerance - ESX Virtualization

  10. Chris Brode

    We have a SQL 2014 failover cluster connected via 10G and when we vmotion the server acting as the Owner Node it puts the DB’s into Recovery and basically we have to restart that node to cause a failover in order to get things back to working. Would be helpful to know of any troubleshooting that can be done to either prove the issue is with VMware or with the SQL config.

    1. Josh Perkins

      Chris, We are also experiencing this issue and Microsoft Premier Support is stating vMotion is not allowed for SQL server failover clusters (SQL 2014). They have pointed us to a number of KB articles stating so but nothing that has any technical details as to answer “why”.

      1. Deji Post author

        Chris and Josh, this is a fairly common phenomenon when using VSS to quiesce a clustered Windows OS and its application. Please see section 4.2 of our SQL Availability guide for the options available to you on vSphere to avoid the issue:


        Also, see: https://www.vmware.com/files/pdf/solutions/SQL_Server_on_VMware-Availability_and_Recovery_Options.pdf#page=9&zoom=auto,69,457

        1. Clay

          Hi Deji – can you clarify on the VSS issues? I could not find mention of it in the link you provided.

          I am running an ESXi/vCenter 6.0 environment with several WSFC clusters (both SQL Server and file servers). My environment meets all VMware requirements for vMotion compatibility .. Server 2012, physical mode SCSI, RDM disks, all hosts at 6.0, v11 guest hardware revision, full memory reservations, 2x10Gb vMotion between hosts, 60ms WSFC heartbeat setting, cluster-across-boxes (with DRS rules to ensure they stay apart), etc

          However when I vMotion these WSFC nodes… most particularly when I vMotion multiple VMs at once (for example in the event of placing a host in maintenance mode) I see VSS and NTFS errors in many of the guest windows event logs, sometimes causing actual WSFC cluster failovers and often causing file system level corruption (I have lost entire databases this way and had to restore them).

          I sent host log bundles into VMware who told me they couldn’t see anything in the host logs and told me to talk to Microsoft. Microsoft noted that the event logs clearly show the RDM disks are ‘disappearing without being cleanly unmounted’ briefly during the vMotion which the O/S is noticing – and is causing the NTFS corruption, so they are pointing back to VMware. I can replicate this problem over and over (although I certainly don’t like to).

          Very frustrating, as I was one of the people doing a jig when this vmotion support news came out 🙁

  11. Rob V

    One scenario that is not discussed in context of WSFC and VMotion is using shared disks that are connected by using the Microsoft ISCSI Initiator. Example. you have two VMs configured and using WSFC but instead of using RDM or VMFS to create quorum or data disks you connect to the LUNS inside the VMs with ISCSI initiator. will this scenario be supported for Vmotion in the future? Technically this works in testing with production data but once again it is never discussed if this is a possible solution, or supported for vmotion with WSFC.

    1. Deji Post author

      If you are using in-guest iSCSI, you are by-passing the vmkernel for your storage presentation. In this configuration, there is no restriction or special consideration for vMotion. You can use vMotion for the configuration and not have to worry about supportability.

  12. Pingback: vSphere (HA, DRS, vMotion) + MSCS Testing - NZSystems

  13. Arvin k

    if my environment consists two esxi host with HA enable.

    1. based on your configuration, SAN zoning should zone together with both esxi host’s WWNs and Storage?

    2. instead of zone the physical WWN, can we use NPIV feature?

    3. if YES, what is the limitation on MSSQL cluster? supported configuration?

    1. Deji Post author

      We always recommend an N+1 configuration when you intend to host shared-disk clustered Windows VM on vSphere. If you are going to have a 2-node Windows cluster, you need at least 3 ESXi hosts in order to ensure optimum availability. The only exception is if you are intentionally creating a “Cluster in a Box” configuration – where all the Windows VM will be on the same ESXi host. For a production environment, this defeats the original need for Windows clustering because the ESXi host becomes a single-point-of-failure.

      We highly recommend (insist is too strong a word) that you use anti-affinity to ALWAYS separate the clustered node to ensure that they don’t co-exist on the same ESXi host.

      All the ESXi hosts that can possibly host your Windows cluster nodes MUST see the LUNs where the VMs reside. This means that you need to zone those LUNs to all the ESXi in the vSphere cluster where you have the Windows VMs. The exception is if you have manually configured a host-vm anti-affinity rule where you intentionally tell the VMs that they must never land on some specific ESXi hosts.

      Please see http://kb.vmware.com/kb/1037959 for information regarding the official maximum Windows cluster size supported on vSphere for one Windows Server Failover Cluster (or MSCS) instance.

      NPIV is still unsupported for Windows clustering. Please see vSphere MSCS Setup Limitations

  14. Matt Jamison

    Any reason a Virtual RDM won’t work? Would love to be able to make snapshots for backups via Veeam.

  15. Deji Post author

    Virtual RDM is supported only for “Cluster-in-a-Box” configuration – where ALL the Windows clustered nodes are residing (and will forever reside) on the same ESXi at the same time.

  16. Austin

    @Deji, I have a system being installed that is going to use multi-writer volumes. We have been told that since it can’t be snapshot, it is not possible to perform a backup of the disk for disaster recovery purposes. My question is, Can a 3rd party disaster recovery vendor, (in this case either NetBackup or HP Data Protector) backup this multi-writer volume using a locally installed client on the VM by treating the volume like any other disk on a physical host. If the volume fails then a standard restore be performed while all application software is down.

    1. Deji Post author

      Will these VMs be running Windows OS? If yes, we don’t support MWF with Windows clustering under any condition.

  17. Phil

    If I have a virtual WSFC using RDM as shared disks can a shared RDM be increased in size whilst the cluster nodes are online?

  18. Wee Kiong Tan

    If we use the same SAN for the MSCS RDM mapped to two cluster of vSphere. OS disk of the MSCS is on SAN. Can we perform a Storage vMotion to only move the OS disk to another cluster and keep the SAN for RDM? Looking at this for a tech refresh migration from old to new cluster of server hardware.

    1. Deji Post author

      Yes, you can selectively migrate a VM’s disks, if you select the “Advanced” option on the “select storage” menu of the wizard. This gives you the option to select the target datastore(s) for disks you want to move. Leave the disk (VMDK or RDM) that you don’t want to move alone and sVmotion will leave it where it is.


Leave a Reply

Your email address will not be published. Required fields are marked *