This is an issue that has come up time and time again. The basic gist of the problem is that when there are Microsoft Cluster Service (MSCS) virtual machines deployed across ESXi hosts (commonly referred to as Cluster Across Boxes or CABs), the virtual machines are sharing access to disks which are typically Raw Device Mappings or RDMs. RDMs are LUNs presented directly to virtual machines. Because we are rebooting an ESXi host which one assumes now has the passive virtual machines/cluster nodes, the other ESXi host or hosts therefore have the active virtual machines/cluster nodes. Since the active nodes have SCSI reservations on the shared disks/RDMs, this slows up the boot process of the ESXi as it tries to interrogate each of these disks during storage discovery. So what can you do to alleviate it? Read on and find out.
As mentioned, this issue has been around for a while. The resolution is basically to get the ESXi host storage discovery process to skip over the LUNs on which a SCSI Reservation has been detected. On ESXi/ESX 4.0, we recommended changing the advanced option Scsi.UWConflictRetries
to 80
. In ESX/ESXi 4.1, a new advanced option called Scsi.CRTimeoutDuringBoot
was introduced (CR is short for Conflict Retries), and the recommendation was to set this value to 1
to speed up the boot process. What these settings did was effectively get the discovery process to move on as quickly as possible once a SCSI reservation was detected.
In ESXi 5.0 & 5.1, a new setting was introduced to make this whole process much smarter. A new flag was introduced to allow an administrator to mark a LUN/RDM as ‘perennially reserved’. This is an indication to the SCSI mid-layer of the VMkernel to not to try to query this device during a ‘discovery’ process. This speeds up the boot process when you have MSCS running in virtual machines and there is a need to boot any of the ESXi hosts which own these virtual machines and associated storage.
Much more information can be found in KB article 1016106 and of course the documentation center here.
Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @VMwareStorage