Home > Blogs > Support Insider > Category Archives: Tech Talk

Category Archives: Tech Talk

Deep-dive technical explorations

A look at All Paths Down in vSphere

Karthick SivaramakrishnanToday we have a guest post from Karthick Sivaramakrishnan, who is a 3 year veteran at VMware. His primary field of expertise is vSphere Storage and Site Recovery Manager.

This blog post is centered around how ESXi handles unscheduled storage disconnects on vSphere 5.x and 6.x. An unscheduled storage disconnect means some issue in the vSphere environment has led to All-Paths-Down (APD) for a datastore.  An APD situation will be seen when ESXi host does not have any path to communicate with a lun on the storage array.

ESXi host can encounter an APD under several conditions. As a result, we may end up having VMs running on a given datastore go down, the host could get disconnected from vCenter, and in worst cases ESXi could become unresponsive.

From vSphere version 5.x and onwards, we are able to discern whether a disconnect is permanent or transient. Ideally a transient disconnect leads to All Paths Down state and ESXi expects the device to have a temporary disconnect. When we see permanent device loss or PDL the device is expected to have a non-recoverable issue like a hardware error or the lun is unmapped.

In the below example we see all iSCSI datastores are in inactive state.

Datastores

To determine what caused this issue we see ESXi logs, particularly vmkernel and vobd. This issue will be evident in the vmkernel logs.

vmkernel log

2017-01-10T13:04:26.803Z cpu1:32896)StorageApdHandlerEv: 110: Device or filesystem with identifier [naa.6000eb31dffdc33a0000000000000028] has entered the All Paths Down state.

2017-01-10T13:04:26.818Z cpu0:32896)StorageApdHandlerEv: 110: Device or filesystem with identifier [naa.6000eb31dffdc33a000000000000002a] has entered the All Paths Down state.

vobd log

2017-01-10T13:04:26.905Z: [scsiCorrelator] 475204262us: [esx.problem.storage.connectivity.lost] Lost connectivity to storage device naa.6000eb31dffdc33a0000000000000028. Path vmhba33:C0:T1:L0 is down. Affected datastores: “Green”.

2017-01-10T13:04:26.905Z: [scsiCorrelator] 475204695us: [esx.problem.storage.connectivity.lost] Lost connectivity to storage device naa.6000eb31dffdc33a000000000000002a. Path vmhba33:C0:T0:L0 is down. Affected datastores: “Grey”.

From these logs we understand that ESXi host has lost connectivity to the datastore. Any virtual machines using the affected datastore may become unresponsive. In this example while the datastores was mounted on ESXi, we lost the network uplink on the nic that was used for iSCSI connection. This was a transient issue and the datastore came up once the network uplink was restored.

In the below example we see Datastore Black is in inactive state.

Datastore view missing

If we look into the logs to determine whats going on we see these events.

Vmkernel.log

2017-01-09T12:42:09.365Z cpu0:32888)ScsiDevice: 6878: Device naa.6000eb31dffdc33a0000000000000063 APD Notify PERM LOSS; token num:1

2017-01-09T12:42:09.366Z cpu1:32916)StorageApdHandler: 1066: Freeing APD handle 0x430180b88880 [naa.6000eb31dffdc33a0000000000000063]

2017-01-09T12:49:01.260Z cpu1:32786)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0xc1) PDL error (0x5/0x25/0x0) – path vmhba33:C0:T3:L0 device naa.6000eb31dffdc33a0000000000000063 – triggering path evaluation

2017-01-09T12:49:01.260Z cpu1:32786)ScsiDeviceIO: 2651: Cmd(0x439d802ec580) 0xfe, CmdSN 0x4b7 from world 32776 to dev “naa.6000eb31dffdc33a0000000000000063” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.

2017-01-09T12:49:01.300Z cpu0:40210)WARNING: NMP: vmk_NmpSatpIssueTUR:1043: Device naa.6000eb31dffdc33a0000000000000063 path vmhba33:C0:T3:L0 has been unmapped from the array

After some time passes you will see this message:

2017-01-09T13:13:11.942Z cpu0:32872)ScsiDevice: 1718: Permanently inaccessible device :naa.6000eb31dffdc33a0000000000000063 has no more open connections. It is now safe to unmount datastores (if any) and delete the device.

In this case the lun was unmapped from the array for this host and that is not a transient issue. Sens data 0x5 0x25 0x0 corresponds to “LOGICAL UNIT NOT SUPPORTED” which indicates the device is in Permanent Device Loss (PDL) state. Once ESXi knows the device is in PDL state it does not wait for the device to return back.

ESXi only checks ASC/ASCQ and if it happens to be 0x25/0x0 or  0x68/0x0, it marks device as PDL.

VMware KB 2004684 has in-depth information around APD and PDL situations. It also talks about planned and unplanned PDL. You can read it here: Permanent Device Loss (PDL) and All-Paths-Down (APD) in vSphere 5.x and 6.x (2004684)

Further on in the hostd logs you will see some additional events that will correlate to storage connection.  Look for the below event id’s.

Event ID : esx.problem.storage.connectivity.lost

datestores3

“esx.problem.storage.connectivity.lost” event indicates a loss in connectivity to the specified storage device.  Any virtual machines using the affected datastore may become unresponsive.

Event ID : esx.problem.scsi.device.state.permanentloss

datastores4

“esx.problem.scsi.device.state.permanentloss” event indicates a permanent device loss.

VSAN Troubleshooting Video Series

vmware-vsanVMware Technical Support University is very proud to present our Virtual SAN Troubleshooting Video Series, comprising of 26 bite sized videos- not only covering troubleshooting but also design, requirements, compatibility, and upgrades. These were first presented in the  VSAN trouble shooting webinar that was conducted 27 October, 2016. This video collection, presented by Francis Daly is not to be missed!

***UPDATE: The videos have been moved. They are now part of a free 60-day Trial Subscription of the VMware Learning Zone. You can sign up for that and get access to all these videos (plus lots more) here https://mylearn.vmware.com/mgrReg/courses.cfm?ui=www_edu&a=det&id_course=316802

  1. Introduction to vSAN Troubleshooting – [1:27]
  2. vSAN Compatibility: Introduction – [2:51]
  3. vSAN Compatibility: vSAN Hardware & Software Requirements – [2:16]
  4. vSAN Compatibility: vSAN Architectural Best Practices – [7:57]
  5. vSAN Compatibility: Is My SSD Supported in vSAN? – [9:20]
  6. vSAN Compatibility: Is My RAID Controller Supported? – [8:00]
  7. Storage Policies: Introduction – [1:28]
  8. Storage Policies: vSAN Storage Policies In-Depth – [2:23]
  9. Storage Policies: What Are Witnesses? – [3:29]
  10. Storage Policies: Component States – [4:59]
  11. Storage Policies: Policy & Object States – [3:22]
  12. Storage Policies: Component Layout FTT1 SW1 – [3:25]
  13. Storage Policies: Component Layout FTT1 SW2 – [3:52]
  14. Storage Policies: Component Layout FTT1 SW1 VMDK 400 – [3:12]
  15. Storage Policies: FTT2 SW2 VMDK100 – [1:58]
  16. Storage Policies: Summary – [3:31]
  17. Common Issues: The Upgrade Process – [2:15]
  18. Common Issues: Upgrade Best Practices – [7:25]
  19. Common Issues: Inaccessible Objects – [3:25]
  20. Common Issues: Creation, Modification of Disk Groups – [3:43]
  21. Common Issues: Capacity – [1:34]
  22. Common Issues: Summary – [1:39]
  23. vSAN Software Components – [7:31]
  24. vSAN Troubleshooting Tools (Part 1) – [7:46]
  25. vSAN Troubleshooting Tools (Part 2) – [9:28]
  26. Summary – vSAN Troubleshooting – [1:23]

vCloud Connector and Datacenter Extension – KBTV Webinars

Continuing on with our KBTV Webinars series, here we have the thirteenth and final installment of the current series titled vCloud Connector and Datacenter Extension.

Note: This video is roughly 35 minutes in duration so it would be advisable to set aside some time in order watch it fully.

KBTV Webinars – How to connect your Datacentre to vCloud Air over IPsec VPN

Continuing on with our current series of KBTV Webinars, here we have the eleventh webinar titled How to connect your Datacentre to vCloud Air over IPsec VPN.

Note: This video is roughly 46 minutes in duration so it would be advisable to block out some time in order watch it fully.

KBTV Webinars – SSL certificate handling in VMware vSphere 6

Continuing on with our current series of KBTV Webinars, here we have the tenth webinar titled SSL certificate handling in VMware vSphere 6 and it goes through what has changed since the vSphere 5.5 product release and what is new in vSphere 6.

What’s new in VMware vSphere 6 – KBTV Webinars

Continuing on with our current series of KBTV Webinars, here we have the ninth webinar titled What is new in vSphere 6 and it goes through what has changed since the vSphere 5.5 product release.

NOTE: This video is roughly 35 minutes in length so it would be worth blocking out some time to watch it!

Introduction to VMware NSX – KBTV Webinars

Continuing on with our current series of KBTV Webinars, here we have the eighth webinar titled: Introduction to VMware NSX which provides an overview of what NSX is, and how it relates to the SDDC (Software Defined Datacenter).

NOTE: This video is roughly 37 minutes in length so it would be worth blocking out some time to watch it!

To see details of upcoming webinars in this series, see the Support Insider Blog post: New Free Webinars.

Overview of Storage I/O Control (SIOC) – KBTV Webinars

This video is the seventh in a series of free webinars for you, our customer. This one is titled: Overview of Storage I/O Control (SIOC) and it goes through what Storage I/O Control is, its requirements and how it works.

NOTE: This video is roughly 40 minutes in length so it would be worth blocking out some time to watch it!

To see details of upcoming webinars in this series, see the Support Insider Blog post: New Free Webinars.

Maintaining the vCenter Server Database

This video is the seventh in a series of free webinars for you, our customer. This one is titled: Maintaining the vCenter Server Database. It provides a high level overview of the vCenter Server Database architecture as well as best practices and tips for maintaining the database.

The video is roughly 15 minutes in length so it would be worth blocking out some time to watch it.

To see details of upcoming webinars in this series, see the Support Insider Blog post: New Free Webinars.

Introduction to VMware vRealize Orchestrator 6 – KBTV Webinars

This video is the sixth in a series of free webinars on various topics. This one is titled: Introduction to VMware vRealize Orchestrator 6. The video is roughly 40 minutes in length so it would be worth blocking out some time to watch it.

To see the details of upcoming webinars in this series, see the Support Insider Blog post: New Free Webinars.

This presentation was originally broadcast live on Thursday 30th April 2015.