posted

0 Comments

Previous deep dive posts have dealt with performance issues or faulty hardware. This week Nathan Small (Twitter handle: vSphereStorage) takes us through the determination of root cause for a SCSI Reservation Conflict issue:

History of issue:

Customer performed a firmware upgrade to their IBM SVC’s. Upon completing the firmware update and bringing them back online, command failures and SCSI reservation conflicts were observed across all hosts. Issue was resolved by sending LUN reset (vmkfstools -L lunreset /vmfs/devices/disks/naa.xxxxxxxxxxxxxxxxx) to the affected LUNs. Root cause analysis requested.

We need to start by ensuring that all hosts saw the issue the same way, and that the firmware update sequence logically makes sense. Let’s start by looking at the firmware update sequence, and to determine when it began and finished. We have to locate when the paths to one target dropped:

2012-04-29T13:18:19.684Z cpu10:4768)<3> rport-5:0-4: blocked FC remote port time out: saving binding

2012-04-29T13:18:19.685Z cpu2:4794)<3>lpfc820 0000:04:00.0: 0:(0):0203 Devloss timeout on WWPN 50:05:07:68:01:30:59:fd NPort x3f010d Data: x0 x7 x0

2012-04-29T13:18:19.713Z cpu12:4759)<3> rport-6:0-4: blocked FC remote port time out: saving binding

2012-04-29T13:18:19.714Z cpu7:4795)<3>lpfc820 0000:04:00.1: 1:(0):0203 Devloss timeout on WWPN 50:05:07:68:01:10:59:fd NPort x5d0010 Data: x0 x7 x0

From the above we can see there were 2 paths dropped, 1 per HBA. Infact, these Emulex device loss messages even give you the WWPN of the IBM SVC targets:

WWPN 50:05:07:68:01:30:59:fd

WWPN 50:05:07:68:01:10:59:fd

For our example, we will refer to LUN 42 when comparing WWPNs of targets and tracking path failures:

naa.60050768018900725000000000000810 : IBM Fibre Channel Disk (naa.60050768018900725000000000000810)

   vmhba1:C0:T2:L42 LUN:42 state:active fc Adapter: WWNN: 20:00:00:00:c9:a2:26:87 WWPN: 10:00:00:00:c9:a2:26:87  Target: WWNN: 50:05:07:68:01:00:59:fd WWPN: 50:05:07:68:01:10:59:fd

   vmhba1:C0:T1:L42 LUN:42 state:active fc Adapter: WWNN: 20:00:00:00:c9:a2:26:87 WWPN: 10:00:00:00:c9:a2:26:87  Target: WWNN: 50:05:07:68:01:00:59:fc WWPN: 50:05:07:68:01:20:59:fc

   vmhba0:C0:T3:L42 LUN:42 state:active fc Adapter: WWNN: 20:00:00:00:c9:a2:26:86 WWPN: 10:00:00:00:c9:a2:26:86  Target: WWNN: 50:05:07:68:01:00:59:fc WWPN: 50:05:07:68:01:40:59:fc

   vmhba0:C0:T2:L42 LUN:42 state:active fc Adapter: WWNN: 20:00:00:00:c9:a2:26:86 WWPN: 10:00:00:00:c9:a2:26:86  Target: WWNN: 50:05:07:68:01:00:59:fd WWPN: 50:05:07:68:01:30:59:fd

Based on the Device Loss messages from the Emulex (lpfc820) driver, we can see that the first and last paths listed were the ones that went down. This means that the other two paths to this LUN can still be used for access.

A non-disruptive upgrade sequence involves taking one array controller offline, apply firmware update, bring controller online, and then apply the same sequence to the other controller. Each of these sequences requires the connected ESX hosts to failover paths (MRU/FIXED) or remove those dead paths from the working path list (RR aka Round Robin). In this environment the PSP (Path Selection Policy) used is FIXED so we see failover events:

2012-04-29T13:18:19.745Z cpu10:6334)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x2a (0x4124011a3a00) to dev “naa.60050768018900725000000000000810” on path “vmhba0:C0:T2:L42” Failed: H:0x1 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.Act:FAILOVER

2012-04-29T13:18:19.745Z cpu10:6334)WARNING: NMP: nmp_DeviceRetryCommand:133:Device “naa.60050768018900725000000000000810”: awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.

The above messages shows us a Host status of ‘1’ (H:0x1), which translates to NO_CONNECT, and is a valid failover condition. For more information on NMP host statuses, See KB 1029039. Next we will see actual failover sequence:

2012-04-29T13:18:19.747Z cpu9:4278)vmw_psp_fixed: psp_fixedSelectPathToActivateInt:479: Changing active path from vmhba0:C0:T2:L42 to vmhba1:C0:T1:L42 for device “naa.60050768018900725000000000000810”.

2012-04-29T13:18:19.747Z cpu2:4786)WARNING: NMP: nmpDeviceAttemptFailover:599:Retry world failover device “naa.60050768018900725000000000000810” – issuing command 0x4124011a3a00

2012-04-29T13:18:19.748Z cpu6:7233)NMP: nmpCompleteRetryForPath:321: Retry world recovered device “naa.60050768018900725000000000000810”

When you see “Retry world recovered device” or “Retry world restored device”, this means that the path has switched and commands are completing on the new path. The firmware update is applied, controller is rebooted, and then brought back online.

16 minutes later we observe LUN resets for all paths/LUN on the controller that was brought down for the firmware update

<snip>

2012-04-29T13:34:16.269Z cpu1:4144)ScsiCore: 1455: Power-on Reset occurred on vmhba1:C0:T2:L42

2012-04-29T13:34:16.270Z cpu1:4144)ScsiCore: 1455: Power-on Reset occurred on vmhba0:C0:T2:L42

<snip>

This is the result of the SVC coming back online after the firmware upgrade. There are no subsequent messages showing that the ESX host has failed back to the original paths, however the reason for this is because there were no preferred paths set by the customer. For more information on this behavior, see KB Preferred path behaves like MRU if preferred paths are not specified in fixed path policy (1009414).

Now it is time to apply the firmware update to the second IBM SVC:

2012-04-29T14:21:23.073Z cpu20:4778)<3> rport-5:0-5: blocked FC remote port time out: saving binding

2012-04-29T14:21:23.074Z cpu20:4794)<3>lpfc820 0000:04:00.0: 0:(0):0203 Devloss timeout on WWPN 50:05:07:68:01:40:59:fc NPort x3f010b Data: x0 x7 x0

2012-04-29T14:21:23.143Z cpu2:4762)<3> rport-6:0-3: blocked FC remote port time out: saving binding

2012-04-29T14:21:23.144Z cpu18:4795)<3>lpfc820 0000:04:00.1: 1:(0):0203 Devloss timeout on WWPN 50:05:07:68:01:20:59:fc NPort x5d000f Data: x0 x7 x0

This time we see the other IBM SVC device disappear:

WWPN 50:05:07:68:01:40:59:fc

WWPN 50:05:07:68:01:20:59:fc

Just as before, we see the host status of “H:0x1” or NO_CONNECT, which requires us to failover:

2012-04-29T14:21:23.153Z cpu2:7233)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x2a (0x412400dd5080) to dev “naa.60050768018900725000000000000810” on path “vmhba1:C0:T1:L42” Failed: H:0x1 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.Act:FAILOVER

2012-04-29T14:21:23.153Z cpu2:7233)WARNING: NMP: nmp_DeviceRetryCommand:133:Device “naa.60050768018900725000000000000810”: awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.

Then we see failover is successful:

2012-04-29T14:21:23.154Z cpu9:4283)vmw_psp_fixed: psp_fixedSelectPathToActivateInt:479: Changing active path from vmhba1:C0:T1:L42 to vmhba1:C0:T2:L42 for device “naa.60050768018900725000000000000810”.

2012-04-29T14:21:23.154Z cpu6:4786)WARNING: NMP: nmpDeviceAttemptFailover:599:Retry world failover device “naa.60050768018900725000000000000810” – issuing command 0x412400dd5080

2012-04-29T14:21:27.640Z cpu2:4132)WARNING: NMP: nmpCompleteRetryForPath:429:Retry world restored device “naa.60050768018900725000000000000810” – no more commands to retry

Later we observe more LUN resets, indicating that the IBM SVC is back online:

<snip>

2012-04-29T14:37:29.305Z cpu17:4759)ScsiCore: 1455: Power-on Reset occurred on vmhba1:C0:T1:L42

2012-04-29T14:37:29.306Z cpu23:4759)ScsiCore: 1455: Power-on Reset occurred on vmhba0:C0:T3:L42

<snip>

At this point it looks like a clean and simple firmware upgrade to both controllers has completed, however shortly after the last controller came back online, we see a flood of SCSI commands to LUNs begin to fail with various host statuses:

2012-04-29T14:37:46.430Z cpu5:4101)ScsiDeviceIO: 2305: Cmd(0x412400d7ce40) 0x2a, CmdSN 0x104bbb to dev “naa.60050768018900725000000000000672” failed H:0xb D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2012-04-29T14:37:46.430Z cpu5:4755)ScsiDeviceIO: 2305: Cmd(0x412401f5e3c0) 0x2a, CmdSN 0x2c47eb to dev “naa.600507680189007250000000000007d7” failed H:0xb D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2012-04-29T14:37:48.427Z cpu1:8559)ScsiDeviceIO: 2305: Cmd(0x412400d7ce40) 0x2a, CmdSN 0x40b3c1 to dev “naa.6005076801890072500000000000080d” failed H:0xb D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2012-04-29T14:37:48.427Z cpu3:4099)ScsiDeviceIO: 2305: Cmd(0x412401d2e5c0) 0x2a, CmdSN 0x54f766 to dev “naa.6005076801890072500000000000065e” failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2012-04-29T14:37:48.497Z cpu13:4109)ScsiDeviceIO: 2288: Cmd(0x412401f94840) 0x2a, CmdSN 0x80000015 to dev “naa.6005076801890072500000000000065e” failed H:0x8 D:0x0 P:0x0

This is where all hosts started to observe an inability to complete SCSI commands to the IBM SVC. All hosts are littered with the above failure messages, and with Emulex verbose logging we might be able to determine why different host errors statuses were returned, but that isn’t important right now. This behavior eventually leads to the following:

2012-04-29T14:38:17.866Z cpu17:134520)ScsiDeviceIO: 2305: Cmd(0x412440df52c0) 0x16, CmdSN 0x207323 to dev “naa.6005076801890072500000000000065e” failed H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2012-04-29T14:38:17.866Z cpu17:134520)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba0:C0:T2:L17, reservation state on device naa.6005076801890072500000000000065e is unknown.

2012-04-29T14:38:18.804Z cpu4:4100)ScsiDeviceIO: 2305: Cmd(0x412401f5e3c0) 0x16, CmdSN 0x207324 to dev “naa.6005076801890072500000000000061e” failed H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2012-04-29T14:38:18.804Z cpu4:4100)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba1:C0:T0:L7, reservation state on device naa.6005076801890072500000000000061e is unknown.

What you are looking at here is a command failure for SCSI command 0x16, which is a SCSI-2 RESERVE command. Not only is this a command failure, but we specifically state that the reservation state of a LUN is “unknown”. This message is displayed when we attempt to issue a SCSI-2 RESERVE and the command does not complete. When we attempt to send an abort for that command and the abort does not complete either, we legitimately do not know what the reservation state of the LUN is at that point. This will typically only happen if there are severe problems trying to communicate with the device, whether the problem is fabric related or the array itself. The fact that this is an IBM SVC will add a layer of complexity to the analysis since there are arrays behind it.

Next we see Reservation errors:

2012-04-29T14:38:28.606Z cpu14:4133)WARNING: FS3: 1835: Reservation error: Timeout

2012-04-29T14:38:28.807Z cpu22:4131)WARNING: FS3: 1835: Reservation error: IO was aborted

Then eventually we see the SATP for the IBM SVC report SCSI Reservation Conflicts for LUNs 1, 2, and 14:

2012-04-29T14:39:53.082Z cpu7:4277)VMW_SATP_SVC: satp_svc_UpdatePath:213: Failed to update path “vmhba0:C0:T3:L1” state. Status=SCSI reservation conflict

2012-04-29T14:39:53.082Z cpu17:4278)VMW_SATP_SVC: satp_svc_UpdatePath:213: Failed to update path “vmhba1:C0:T1:L2” state. Status=SCSI reservation conflict

2012-04-29T14:43:27.452Z cpu13:4277)VMW_SATP_SVC: satp_svc_UpdatePath:213: Failed to update path “vmhba1:C0:T1:L14” state. Status=SCSI reservation conflict

This needs to be our area of focus. The issue was finally resolved by issuing a LUN reset to the affected LUNs:

2012-04-30T02:26:32.197Z cpu23:2689589)WARNING: NMP: nmpDeviceTaskMgmt:2210:Attempt to issue lun reset on device naa.6005076801890072500000000000083b. This will clear any SCSI-2 reservations on the device.

2012-04-30T02:26:32.204Z cpu23:2689589)<3>lpfc820 0000:04:00.1: 1:(0):0713 SCSI layer issued Device Reset (1, 14) reset status x2002 flush status x2002

2012-04-30T02:26:32.204Z cpu23:2689589)Resv: 618: Executed out-of-band lun reset on naa.6005076801890072500000000000083b

2012-04-30T02:26:32.873Z cpu6:2689590)WARNING: NMP: nmpDeviceTaskMgmt:2210:Attempt to issue lun reset on device naa.60050768018900725000000000000672. This will clear any SCSI-2 reservations on the device.

2012-04-30T02:26:32.879Z cpu6:2689590)<3>lpfc820 0000:04:00.0: 0:(0):0713 SCSI layer issued Device Reset (2, 2) reset status x2002 flush status x2002

2012-04-30T02:26:32.879Z cpu6:2689590)Resv: 618: Executed out-of-band lun reset on naa.60050768018900725000000000000672

2012-04-30T02:29:28.765Z cpu18:1055548)WARNING: NMP: nmpDeviceTaskMgmt:2210:Attempt to issue lun reset on device naa.6005076801890072500000000000095f. This will clear any SCSI-2 reservations on the device.

2012-04-30T02:29:28.773Z cpu18:1055548)<3>lpfc820 0000:04:00.1: 1:(0):0713 SCSI layer issued Device Reset (2, 22) reset status x2002 flush status x2002

2012-04-30T02:29:28.773Z cpu18:1055548)Resv: 618: Executed out-of-band lun reset on naa.6005076801890072500000000000095f

Why did a SCSI-2 RESERVE get left on a LUN requiring a manual LUN reset be issued to the device to clear it? Well, as stated above, we observed a host state that the reservation state of a LUN was unknown because of the following sequence:

1. Issue SCSI command 0x16 (SCSI RESERVE)

2. Command fails (timeout)

3. Issue abort for the SCSI command 0x16

4. Command may fail

5. Reservation state of device is unknown

When we look at the other hosts during that time period, they also had difficulty completing SCSI commands, specifically SCSI-2 RESERVE resulting in the other hosts not knowing the reservation state of LUNs:

esx-esxhost1.local-2012-04-30–14.41/var/run/log/vmkernel.1:2012-04-29T14:38:02.029Z cpu5:4101)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba1:C0:T0:L17, reservation state on device naa.6005076801890072500000000000065e is unknown.

esx-esxhost1.local-2012-04-30–14.41/var/run/log/vmkernel.1:2012-04-29T14:38:02.029Z cpu21:4117)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba1:C0:T0:L16, reservation state on device naa.6005076801890072500000000000065d is unknown.

esx-esxhost1.local-2012-04-30–14.41/var/run/log/vmkernel.1:2012-04-29T14:38:02.029Z cpu21:4117)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba1:C0:T0:L39, reservation state on device naa.6005076801890072500000000000080d is unknown.

esx-esxhost1.local-2012-04-30–14.41/var/run/log/vmkernel.1:2012-04-29T14:38:42.028Z cpu17:4113)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba1:C0:T0:L18, reservation state on device naa.6005076801890072500000000000066f is unknown.

esx-esxhost1.local-2012-04-30–14.41/var/run/log/vmkernel.1:2012-04-29T14:38:42.028Z cpu3:4099)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba1:C0:T0:L2, reservation state on device naa.60050768018900725000000000000672 is unknown.

esx-esxhost1.local-2012-04-30–14.41/var/run/log/vmkernel.1:2012-04-29T14:38:42.029Z cpu17:4113)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba1:C0:T0:L15, reservation state on device naa.600507680189007250000000000007d7 is unknown.

esx-esxhost1.local-2012-04-30–14.41/var/run/log/vmkernel.1:2012-04-29T14:39:22.028Z cpu10:550864)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba1:C0:T0:L39, reservation state on device naa.6005076801890072500000000000080d is unknown.

esx-esxhost1.local-2012-04-30–14.41/var/run/log/vmkernel.1:2012-04-29T14:39:22.028Z cpu18:7351)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba1:C0:T0:L14, reservation state on device naa.6005076801890072500000000000083b is unknown.

esx-esxhost1.local-2012-04-30–14.41/var/run/log/vmkernel.1:2012-04-29T14:39:22.028Z cpu10:550864)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba1:C0:T0:L17, reservation state on device naa.6005076801890072500000000000065e is unknown.

esx-esxhost2.local-2012-04-30–14.52/var/run/log/vmkernel.1:2012-04-29T14:38:17.866Z cpu17:134520)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba0:C0:T2:L17, reservation state on device naa.6005076801890072500000000000065e is unknown.

esx-esxhost2.local-2012-04-30–14.52/var/run/log/vmkernel.1:2012-04-29T14:38:18.804Z cpu4:4100)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba1:C0:T0:L7, reservation state on device naa.6005076801890072500000000000061e is unknown.

esx-esxhost2.local-2012-04-30–14.52/var/run/log/vmkernel.1:2012-04-29T14:38:18.866Z cpu23:2134223)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba0:C0:T2:L39, reservation state on device naa.6005076801890072500000000000080d is unknown.

esx-esxhost2.local-2012-04-30–14.52/var/run/log/vmkernel.1:2012-04-29T14:38:57.866Z cpu15:2134223)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba0:C0:T2:L3, reservation state on device naa.60050768018900725000000000000721 is unknown.

esx-esxhost2.local-2012-04-30–14.52/var/run/log/vmkernel.1:2012-04-29T14:38:58.866Z cpu2:1781073)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba0:C0:T2:L18, reservation state on device naa.6005076801890072500000000000066f is unknown.

esx-esxhost2.local-2012-04-30–14.52/var/run/log/vmkernel.1:2012-04-29T14:38:58.866Z cpu23:1675016)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba0:C0:T2:L15, reservation state on device naa.600507680189007250000000000007d7 is unknown.

esx-esxhost2.local-2012-04-30–14.52/var/run/log/vmkernel.1:2012-04-29T14:39:37.866Z cpu17:2154857)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba0:C0:T2:L1, reservation state on device naa.600507680189007250000000000006d7 is unknown.

esx-esxhost2.local-2012-04-30–14.52/var/run/log/vmkernel.1:2012-04-29T14:39:38.866Z cpu20:1675016)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba0:C0:T2:L17, reservation state on device naa.6005076801890072500000000000065e is unknown.

esx-esxhost2.local-2012-04-30–14.52/var/run/log/vmkernel.1:2012-04-29T14:39:38.866Z cpu20:1675016)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba0:C0:T2:L2, reservation state on device naa.60050768018900725000000000000672 is unknown.

esx-esxhost3.local-2012-04-30–09.49/var/run/log/vmkernel.log:2012-04-29T09:40:01.001Z cpu8:4104)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba0:C0:T0:L14, reservation state on device naa.6005076801890072500000000000083b is unknown.

esx-esxhost3.local-2012-04-30–09.49/var/run/log/vmkernel.log:2012-04-29T09:40:41.000Z cpu0:4120)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba0:C0:T0:L14, reservation state on device naa.6005076801890072500000000000083b is unknown.

esx-esxhost4.local-2012-04-30–14.37/var/run/log/vmkernel.1:2012-04-29T14:39:00.633Z cpu17:4113)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba1:C0:T3:L7, reservation state on device naa.6005076801890072500000000000061e is unknown.

esx-esxhost4.local-2012-04-30–14.37/var/run/log/vmkernel.1:2012-04-29T14:39:01.633Z cpu11:4107)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba1:C0:T1:L3, reservation state on device naa.60050768018900725000000000000721 is unknown.

esx-esxhost4.local-2012-04-30–14.37/var/run/log/vmkernel.1:2012-04-29T14:39:01.633Z cpu15:14989)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba1:C0:T1:L15, reservation state on device naa.600507680189007250000000000007d7 is unknown.

esx-esxhost4.local-2012-04-30–14.37/var/run/log/vmkernel.1:2012-04-29T14:39:40.633Z cpu3:4099)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba1:C0:T1:L1, reservation state on device naa.600507680189007250000000000006d7 is unknown.

esx-esxhost4.local-2012-04-30–14.37/var/run/log/vmkernel.1:2012-04-29T14:39:41.633Z cpu5:4101)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba1:C0:T1:L14, reservation state on device naa.6005076801890072500000000000083b is unknown.

esx-esxhost4.local-2012-04-30–14.37/var/run/log/vmkernel.1:2012-04-29T14:39:41.633Z cpu16:15845)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba1:C0:T3:L7, reservation state on device naa.6005076801890072500000000000061e is unknown.

esx-esxhost4.local-2012-04-30–14.37/var/run/log/vmkernel.1:2012-04-29T14:40:20.634Z cpu15:4174)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba1:C0:T1:L3, reservation state on device naa.60050768018900725000000000000721 is unknown.

esx-esxhost4.local-2012-04-30–14.37/var/run/log/vmkernel.1:2012-04-29T14:40:21.634Z cpu17:4113)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba1:C0:T1:L15, reservation state on device naa.600507680189007250000000000007d7 is unknown.

esx-esxhost4.local-2012-04-30–14.37/var/run/log/vmkernel.1:2012-04-29T14:40:21.634Z cpu17:4113)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba1:C0:T1:L1, reservation state on device naa.600507680189007250000000000006d7 is unknown.

This shows us that more than LUN 1, 2, and 14 reported the reservation state unknown errors, though LUNs 1, 2, and 14 were the only LUNs that had a SCSI-2 RESERVE left on the volume. The reason why one LUN had the reserve left on it while another did not is purely due to whatever operation a host was doing when it attempted to grab the reserve. Determining why this was the case for one LUN over another is not as important as why the hosts started to have difficulty completing SCSI commands to the IBM SVC following the reboot of the last controller after a firmware update.

These unknown reservation states occurred for a 2 minute window between 2012-04-29T14:38:02.029Z and 2012-04-29T14:40:21.634Z. Since this is UTC time, this translates to 9:38am to 9:40am, which the time period in which the last SVC reboot occurred and then brought its paths back online (9:38am) to the moment the SVC had fully logged back into its targets (9:40am). This can be illustrated with the following entries from the IBM SVC logs:

Error Log Entry 960

  Node Identifier       : SVCN4

  Object Type           : node

  Object ID             : 13

  Copy ID               :

  Sequence Number       : 30815

  Root Sequence Number  : 30815

  First Error Timestamp : Sun Apr 29 09:38:13 2012

                        : Epoch + 1335710293

  Last Error Timestamp  : Sun Apr 29 09:38:13 2012

                        : Epoch + 1335710293

  Error Count           : 1

  Error ID              : 987102 : Node coldstarted

  Error Code            :

  Status Flag           : SNMP trap raised

  Type Flag             : INFORMATION

Error Log Entry 6438

  Node Identifier       : SVCN3

  Object Type           : cluster

  Object ID             : 0

  Copy ID               :

  Sequence Number       : 0

  Root Sequence Number  : 0

  First Error Timestamp : Sun Apr 29 09:38:18 2012

                        : Epoch + 1335710298

  Last Error Timestamp  : Sun Apr 29 09:38:18 2012

                        : Epoch + 1335710298

  Error Count           : 1

  Error ID              : 990415 : Software Install completed

  Error Code            :

  Status Flag           :

  Type Flag             : CONFIGURATION EVENT

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Error Log Entry 962

  Node Identifier       : SVCN4

  Object Type           : cluster

  Object ID             : 0

  Copy ID               :

  Sequence Number       : 30816

  Root Sequence Number  : 30813

  First Error Timestamp : Sun Apr 29 09:40:43 2012

                        : Epoch + 1335710443

  Last Error Timestamp  : Sun Apr 29 09:40:43 2012

                        : Epoch + 1335710443

  Error Count           : 1

  Error ID              : 981003 : FC discovery occurred, configuration changes complete

  Error Code            :

  Status Flag           : SNMP trap raised

  Type Flag             : INFORMATION

01 02 00 00 04 1E 00 00 04 04 00 00 06 00 06 00

06 00 00 00 50 01 00 00 60 00 60 00 60 00 30 00

00 00 00 00 00 00 00 00 68 FE 01 00 66 00 00 00

32 01 00 00 F0 FF FF FF 0B 00 00 00 66 00 00 00

66 00 66 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Error Log Entry 420

  Node Identifier       : SVCN3

  Object Type           : cluster

  Object ID             : 0

  Copy ID               :

  Sequence Number       : 30817

  Root Sequence Number  : 30813

  First Error Timestamp : Sun Apr 29 09:40:43 2012

                        : Epoch + 1335710443

  Last Error Timestamp  : Mon Apr 30 01:00:12 2012

                        : Epoch + 1335765612

  Error Count           : 2

  Error ID              : 981004 : FC discovery occurred, no configuration changes were detected

  Error Code            :

  Status Flag           : SNMP trap raised

  Type Flag             : INFORMATION

01 01 00 00 1C 1E 00 00 04 04 00 00 06 00 06 00

00 00 00 00 50 01 00 00 60 00 60 00 60 00 30 00

00 00 00 00 00 00 00 00 68 FE 01 00 00 00 00 00

98 01 00 00 F0 FF FF FF 0F 00 00 00 66 00 00 00

00 00 66 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

The collective ESX logs alone clearly show that the SVC was not completing all SCSI commands from the hosts within the 5 second timeout window during the timeframe of 9:38 and 9:40, but some of those commands did complete, particularly a few SCSI-2 reserves that the ESX hosts lost track of due to the timeout events. Also, based on the ESX hosts logs and the SVC logs, we would postulate that the SVC advertised its paths as online and active before it had fully logged into the arrays behind it. This means that commands could have been queued up on the SVC side while waiting for login to complete and then flush those commands through, which may have explained the behavior we observed.

While the SCSI-2 Reserve was a symptom of the issue, this scenario as well as many other scenarios regarding SCSI Reservation Conflicts can be avoid by utilizing the hardware assisted locking feature of VAAI, also known as ATS (Atomic Test and Set). This type of locking will not prevent another host from accessing the volume. This feature was introduced in vSphere 4.1 and improved in vSphere 5.0, and requires a storage array that supports VAAI.

For more information on VAAI and ATS, please see KB vStorage APIs for Array Integration FAQ (1021976).