Home > Blogs > Support Insider

Top 20 NSX KBs for January 2016

  1. Update sequence for vSphere 6.0 and its compatible VMware products
  2. How to license VMware NSX for vSphere 6.x
  3. Downloading and enabling VMware vShield Endpoint 5.x on supported VMware vSphere platforms
  4. Important information before upgrading to vSphere 6.0
  5. NSX is unavailable from the vSphere Web Client Plugin after backup of NSX Manager with quiesced snapshot
  6. Licensing VMware vSphere 5.5.x/6.0.x and VMware NSX for vSphere 6.x
  7. vCenter Server or Platform Services Controller certificate validation error for external VMware Solutions in vSphere 6.0
  8. In VMware NSX for vSphere 6.2.1, adding an ESXi host to the vSphere Distributed Switch fails with the error: Host is not licensed for the VDS feature. Cannot add this host to dvSwitch
  9. Deploying VXLAN through Auto Deploy and VMware NSX for vSphere 6.x
  10. Collecting diagnostic bundles from VMware NSX for Multi-Hypervisor Controller Appliances
  11. Windows virtual machines using the vShield Endpoint TDI Manager or NSX Network Introspection Driver (vnetflt.sys) driver fails with a blue diagnostic screen
  12. Oracle connections time out when forwarded through the VMware NSX for vSphere 6.1.x Edge
  13. Migration of Service VM (SVM) may cause ESXi host issues in VMware NSX for vSphere 6.x
  14. In VMware NSX for vSphere 6.x, adding a second profile to SSL VPN-Plus in Windows 10 fails with the error: SSL VPN-PLus Client could not be upgraded. Try again after rebooting if SSL VPN-Plus Client was already installed on your machine
  15. vShield/NSX Manager Web interface does not update in VMware vCloud Networking and Security 5.x and NSX for vSphere 6.x
  16. Installing VXLAN Agent fails with ESX Agent Manager displaying the error: Agent VIB module not installed
  17. Collecting diagnostic information for VMware NSX for vSphere 6.x
  18. Network Port Requirements for VMware NSX for vSphere 6.x
  19. A virtual machine loses network connectivity during migration between resource pools, clusters or vApps in vCloud Networking and Security 5.1.4, 5.5.2 and NSX for vSphere 6.0.4
  20. Clicking NSX Home in the vSphere Web Client reports the error: No NSX Managers available. Verify current user has role assigned on NSX Manager error

Important Updates to NSX for vSphere KB Articles

Here is a list of currently trending KB articles for NSX for vSphere.

NSX Distributed Firewall unfiltered rules may be deleted during the firewall rule publish operation in VMware NSX for vSphere 6.x (2141155)
Symptoms:

  • In Distributed Firewall Apply Filter mode in VMware NSX for vSphere 6.x, when filtered rules belonging to only one firewall section is modified and published by either Publish Section or Publish Changes or filtered rules belonging to multiple firewall sections are modified, you experienced these symptoms:
  • The publish operation may publish only the filtered rules
  • All the unfiltered rules within that section may be removed

VMware NSX for vSphere 6.2.1 supportability with vRealize Log Insight Reporting (2143058)
Symptoms:

  • NSX for vSphere Log Insight Content Pack (both 1.0, and 2.0) does not work with NSX for vSphere 6.2 infrastructure release

The netcpa agent on an ESXi host fails to communicate with NSX controller(s) in VMware NSX for vSphere 6.x (2137011)
Symptoms:

  • Routes from dynamic routing protocols may not be present in VMware NSX for vSphere 6.x Edge Services Gateway (ESG) or Distributed Logical Router (DLR) when the Control VM is running on affected ESXi host
  • Virtual machines on the affected ESXi host fails to communicate with other virtual machines running on other ESXi hosts
  • Running the esxcli network vswitch dvs vmware vxlan network list –vds-name=Name_VDS command on the ESXi host displays the VNIs as down

NSX User Interface (UI) is grayed out for several seconds and exhibits slow performance on NSX for vSphere 6.2.0 (2141919)
Symptoms:

  • Selecting an item on the vSphere Web client for NSX appears to be slow. Specifically, the screen turns grey and pauses for roughly 10 seconds before a corresponding request from the Web client reaches NSX Manager and is captured in the NSX Manager logs (vsm.log)
  • After this initial slowness, performance works as expected
  • The NSX Manager logs display a roughly five-second lag to complete login and successful authentication between NSX Manager and Virtual Center

NSX is unavailable from the vSphere Web Client Plugin after backup of NSX Manager with quiesced snapshot (2142263)
Symptoms:

  • NSX is unavailable from the vSphere Web Client plug-in.
  • The NSX plug-in User Interface (UI) is reachable but the pages do not display content
  • A restart of the NSX Manager is required to recover from this situation

Top 20 VSAN KBs for January 2016

  1. Following maintenance, hosts in a Virtual SAN cluster may erroneously report the error: Host cannot communicate with all other nodes in virtual SAN enabled cluster
  2. VMware Virtual SAN 6.0 Requirements
  3. Support statement for 512e and 4K Native drives for VMware vSphere and VSAN
  4. Virtual SAN enabled DELL PowerEdge hosts running ESXi 5.5 or 6.x randomly PSOD
  5. vSphere 5.5 Virtual SAN requirements
  6. HP SSD Smart Path and VMware Virtual SAN
  7. Upgrading the VMware Virtual SAN (VSAN) on-disk format version from 1 to 2
  8. Requirements and considerations for the deployment of VMware Virtual SAN (VSAN)
  9. Powering on virtual machines in VMware Virtual SAN 5.5 fails with error: Failed to create swap file
  10. Virtual SAN Health Service – Data Health – Virtual SAN Object Health
  11. Upgrading from VMware vSphere 5.5 EP06 or 5.5 P04 to vSphere 6.0 GA can cause VMware Virtual SAN Data Unavailability
  12. VMware Virtual SAN 6.1 or 5.5 Update 3 Disk Groups show as Unmounted in the vSphere Web Client
  13. Avoiding a known drive failure issue when Dell PERC H730 controller is used with VMware Virtual SAN 5.5 or 6.0
  14. VSAN Multicast performance test Fails
  15. VMware vSphere 6.x VSAN Health Service Status Reports fails with the error: Unknown (Issues connecting to EAM. Try restarting it.)
  16. Adding a host back to a Virtual SAN cluster after an ESXi host rebuild
  17. Enabling or capturing performance statistics using Virtual SAN Observer for VMware Virtual SAN
  18. Removing a VMware Virtual SAN-enabled cluster and detaching the VMware ESXi host from the cluster fails
  19. ESXi 5.5 host in a VSAN cluster reports a warning alarm: Host vendor provider registration
  20. Virtual SAN Health Service – Network Health – Unexpected VSAN cluster members

Top 20 vRealize Operations Manager KBs for January 2016

  1. Update sequence for vSphere 6.0 and its compatible VMware products
  2. vRealize Operations Manager 6.1 and 6.2 Sizing Guidelines
  3. Resetting the Administrator password in VMware vRealize Operations Manager appliance
  4. Upgrading VMware vRealize Operations Manager 6.1 fails on one or more nodes with the error: PAK install failed
  5. Collecting diagnostic information from VMware vRealize Operations Manager (vApp) 6.0.x
  6. Restarting VMware vRealize Operations Manager 6.0.x fails with a Waiting for Analytics message in the Admin UI
  7. Adding additional storage to the VMware vRealize Operations Manager Analytics or UI virtual machine
  8. Upgrading VMware vRealize Operations Manager to version 6.1 stops at Installation in Progress
  9. VMware vRealize Operations Manager 6.1.x upgrade hangs if expired projects are present
  10. After applying the VMware vRealize Operations Manager 6.1 hotfix 1 (build 3136126), the Licensing tab reports an internal error
  11. vRealize Operations Manager Sizing Guidelines
  12. Replacing or repairing VMware vSphere solution dashboards
  13. Cluster Status is stuck at Going Online in VMware vRealize Operations Manager 6.0.x
  14. After upgrading to VMware vRealize Operations Manager for Horizon 6.1, pairing the Broker Agent with the Horizon adapter fails with the error: BrokerPoll message sending error: javax.naming.NameNotFoundException
  15. Manually removing a node from the VMware vRealize Operations Manager 6.x cluster
  16. In VMware vRealize Operations Manager 6.0.x, no metrics are collected after running the Get Desktop Process action
  17. Logging in to VMware vRealize Operations Manager or viewing dashboards fails with the blob error: Could not read the file from main and replica copies for blob
  18. How to enable/disable HA when the option is not available via the vRealize Operations Manager 6.x Admin UI
  19. The Collection State of the View Adapter instance appears to be Failed in VMware vRealize Operations Manager for Horizon 6.0
  20. Creating low disk space alerts for the virtual machine guest file systems in VMware vRealize Operations Manager 6.0.x

Top 20 vRealize Automation KBs for January 2016

  1. Various tasks fail after upgrading or migrating to VMware vRealize Automation 6.1.x or 6.2
  2. VMware vRealize Automation workflows fail with the error: System.Xaml.XamlObjectWriterException: Failed to create a “PointCollection”
  3. Migrating to a new SSO or recovering from a reinstallation of SSO in VMware vRealize Automation
  4. Logging in to VMware vRealize Automation 6.2 using Native Active Directory fails with the error: Incorrect username/password
  5. Installing VMware vRealize Automation 6.2.x IaaS fails with the error: exited with code -1
  6. Provisioning a machine using VMware vRealize Automation fails with the error: Error executing query usp=SelectHostReservation
  7. Troubleshooting Model Manager Data installation failures in VMware vRealize Automation
  8. Network profile is not assigned to new virtual machines in VMware vRealize Automation 6.1.x and 6.2.x
  9. Load balancing requirements for VMware vRealize Automation (multiple versions)
  10. Using JXplorer to update the LDAP string for an identity source for VMware vRealize Automation 6.0.x, 6.1.x
  11. Removing a virtual machine from Management in VMware vRealize Automation or vCloud Automation
  12. Cannot find template when creating a blueprint in VMware vRealize Automation 6.0.x using the CloneWorkflow provisioning workflow
  13. Customizing email templates in VMware vRealize Automation (formerly known as VMware vCloud Automation Center)
  14. Setting the Kerberos token size for VMware vRealize Automation 6.x deployments
  15. Moving the VMware vRealize Automation 6.x SQL database to a new server
  16. Guest customization fails when deploying virtual machines through VMware vRealize Automation that have multiple NICs attached
  17. Connecting the guest agent to IaaS on Windows 2012 fails with the error: Client certificate chain file not specified
  18. Configuring SSO settings in VMware vRealize Automation 6.1.x and 6.2.x fails with the error: Trying to access remote SSO on host %host% and port 7444, but the returned host is %differenthost% and port 7444
  19. Changing the VMware vRealize Automation IaaS service account or updating the vRealize Automation IaaS service account password
  20. that’s all this month!

Top 20 vCenter KBs for January 2016

  1. Purging old data from the database used by VMware vCenter Server
  2. Permanent Device Loss (PDL) and All-Paths-Down (APD) in vSphere 5.x and 6.x
  3. Unlocking and resetting the VMware vCenter Single Sign-On administrator password
  4. Upgrading to vCenter Server 6.0 best practices
  5. Updating vCenter Server 5.5 to Update 3 fails with the error: Warning 32014. A utility for phone home data collector couldn”t be executed successfully
  6. Determining where growth is occurring in the VMware vCenter Server database
  7. The Host IPMI System Event Log Status alarm is triggered repeatedly in VMware vCenter Server 4.x and 5.x
  8. List of recommended topologies for VMware vSphere 6.0.x
  9. Determining why a virtual machine was powered off or restarted
  10. Migrating the vCenter Server database from SQL Express to full SQL Server
  11. Powering on a virtual machine fails after a storage outage with the error: could not open/create change tracking file
  12. Linked Clone pool creation and recompositon fails with VMware Horizon View 6.1.x and older releases
  13. Cannot take a quiesced snapshot of Windows 2008 R2 virtual machine
  14. Consolidating snapshots in vSphere 5.x/6.0
  15. vCenter Server 5.1/5.5/6.0 performance charts report dropped network packets
  16. The vpxd process becomes unresponsive after upgrading to VMware vCenter Server 5.5
  17. Update sequence for vSphere 6.0 and its compatible VMware products
  18. Registering or adding a virtual machine to the inventory on vCenter Server or on an ESX/ESXi host
  19. System logs are stored on non-persistent storage
  20. vCenter Server 5.5 fails to start after reboot with the error: Unable to create SSO facade: Invalid response code: 404 Not Found

Top 20 ESXi articles for January 2016

  1. VMware ESXi 5.x host experiences a purple diagnostic screen mentioning E1000PollRxRing and E1000DevRx
  2. Determining Network/Storage firmware and driver version in ESXi/ESX 4.x, ESXi 5.x, and ESXi 6.x
  3. Commands to monitor snapshot deletion in VMware ESXi/ESX
  4. Permanent Device Loss (PDL) and All-Paths-Down (APD) in vSphere 5.x and 6.x
  5. Downloading and installing async drivers in VMware ESXi 5.x and ESXi 6.0.x
  6. Backing up a Changed Block Tracking enabled virtual machine in ESXi 6.0.x returns incorrect changed sectors
  7. Recreating a missing virtual machine disk descriptor file
  8. Installing patches on an ESXi 5.x/6.x host from the command line
  9. ESXi host cannot initiate vMotion or enable services and reports the error: Heap globalCartel-1 already at its maximum size.Cannot expand
  10. Powering off a virtual machine on an ESXi host
  11. Using a Dell Perc H730 controller in an ESXi 5.5 or ESXi 6.0 host displays IO failures or aborts, and reports unhealthy VSAN disks
  12. Estimating the time required to consolidate snapshots for VMware ESX and VMware ESXi
  13. Information about the error: state in doubt; requested fast path state update
  14. Restarting the Management agents on an ESXi or ESX host
  15. The Host IPMI System Event Log Status alarm is triggered repeatedly in VMware vCenter Server 4.x and 5.x
  16. Committing snapshots when there are no snapshot entries in the Snapshot Manager
  17. NETDEV WATCHDOG timeout error and ESXi 6.0 loses network connectivity
  18. Determining why a virtual machine was powered off or restarted
  19. Powering on a virtual machine fails after a storage outage with the error: could not open/create change tracking file
  20. Using esxtop to identify storage performance issues for ESX / ESXi (multiple versions)

Top 20 Articles for January 2016

Top 20Here is our Top 20 KB list for January 2016. This list is ranked by the number of times a VMware Support Request was resolved by following the steps in a published Knowledge Base article.

  1. Commands to monitor snapshot deletion in VMware ESXi/ESX
  2. Manually deleting linked clones or stale virtual desktop entries from the View Composer database in VMware View Manager and VMware Horizon View
  3. VMware ESXi 5.x host experiences a purple diagnostic screen mentioning E1000PollRxRing and E1000DevRx
  4. Determining Network/Storage firmware and driver version in ESXi/ESX 4.x, ESXi 5.x, and ESXi 6.x
  5. Recreating a missing virtual machine disk descriptor file
  6. Licensing vRealize Operations Manager 5.x
  7. Consolidating snapshots in vSphere 5.x/6.0
  8. Estimating the time required to consolidate snapshots for VMware ESX and VMware ESXi
  9. Following maintenance, hosts in a Virtual SAN cluster may erroneously report the error: Host cannot communicate with all other nodes in virtual SAN enabled cluster
  10. Linked Clone pool creation and recompositon fails with VMware Horizon View 6.1.x and older releases
  11. Delete all Snapshots and Consolidate Snapshots feature FAQ
  12. Permanent Device Loss (PDL) and All-Paths-Down (APD) in vSphere 5.x and 6.x
  13. Downloading and installing async drivers in VMware ESXi 5.x and ESXi 6.0.x
  14. Committing snapshots when there are no snapshot entries in the Snapshot Manager
  15. Information about the error: state in doubt; requested fast path state update
  16. Update sequence for VMware vSphere 5.5 and its compatible VMware products
  17. Using Microsoft Certreq to generate signed SSL certificates in VMware Horizon View
  18. Deprecated VMFS volume warning reported by ESXi hosts
  19. Using esxtop to identify storage performance issues for ESX / ESXi (multiple versions)
  20. Configuring a diagnostic coredump partition on an ESXi 5.x host

View Composer DiskFault: Disk customization failed due to an internal error

There’s a glitch some customers are encountering  whereby a Linked Clone desktop is created on vCenter Ok, but is deleted soon after vCenter complains about either “disposable” or “internal” .vmdk files this desktop. Creating and recomposing linked clone desktops fails on VMware Horizon View 6.1.x and all older releases after you have applied an upgrade patch.

On View Administrator, recomposing of pool fails with the error:

View Composer DiskFault: Disk customization failed due to an internal error

This issue occurs when older versions of Horizon View Composer prior to VMware Horizon 6.2 attempt to communicate with VMware ESXi hosts, but SSL v3 is disabled beginning in VMware ESXi 5.5 Update 3b and 6.0 Update 1 hosts.

We’ve created a KB article to address this scenario here: Linked Clone pool creation and recompositon fails with VMware Horizon View 6.1.x and older releases (2133018)

Note: There is a workaround provided in the KB but it is not advised to use it. The workaround makes the VMware ESXi 5.5/ 6.0 hosts vulnerable to security threats reported for SSL v3

Host disconnected from vCenter and VMs showing as inaccessible

Another deep-dive troubleshooting blog today from Nathan Small (twitter account: @vSphereStorage)
 
Description from customer:
 
Host is getting disconnected from vCenter and VMs are showing as inaccessible. Only one host is affected.
 
 
Analysis:
 
A quick review of the vmkernel log shows a log spew of H:0x7 errors to numerous LUNs. Here is a short snippet where you can see how frequently they are occurring (multiple times per second):
 
# cat /var/log/vmkernel.log
 
2016-01-13T18:54:42.994Z cpu68:8260)ScsiDeviceIO: 2326: Cmd(0x412540b96e80) 0x28, CmdSN 0x8000006b from world 11725 to dev “naa.600601601b703400a4f90c3d0668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:43.027Z cpu68:8260)ScsiDeviceIO: 2326: Cmd(0x4125401b2580) 0x28, CmdSN 0x8000002e from world 11725 to dev “naa.600601601b70340064a24ada10fae211” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:43.030Z cpu68:8260)ScsiDeviceIO: 2326: Cmd(0x4125406d5380) 0x28, CmdSN 0x80000016 from world 11725 to dev “naa.600601601b7034000c70e4e610fae211” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:43.542Z cpu67:8259)ScsiDeviceIO: 2326: Cmd(0x412540748800) 0x28, CmdSN 0x80000045 from world 11725 to dev “naa.600601601b70340064a24ada10fae211” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:43.808Z cpu74:8266)ScsiDeviceIO: 2326: Cmd(0x412541229040) 0x28, CmdSN 0x8000003c from world 11725 to dev “naa.600601601b7034008e56670a11fae211” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:44.088Z cpu38:8230)ScsiDeviceIO: 2326: Cmd(0x4124c0ff4f80) 0x28, CmdSN 0x80000030 from world 11701 to dev “naa.600601601b703400220f77ab15fae211” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:44.180Z cpu74:8266)ScsiDeviceIO: 2326: Cmd(0x412540ccda80) 0x28, CmdSN 0x80000047 from world 11725 to dev “naa.600601601b70340042b582440668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:44.741Z cpu61:8253)ScsiDeviceIO: 2326: Cmd(0x412540b94480) 0x28, CmdSN 0x80000051 from world 11725 to dev “naa.600601601b70340060918f5b0668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:44.897Z cpu63:8255)ScsiDeviceIO: 2326: Cmd(0x412540ff3180) 0x28, CmdSN 0x8000007a from world 11725 to dev “naa.600601601b7034005c918f5b0668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:45.355Z cpu78:8270)ScsiDeviceIO: 2326: Cmd(0x412540f3b2c0) 0x28, CmdSN 0x80000039 from world 11725 to dev “naa.600601601b70340060918f5b0668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:45.522Z cpu70:8262)ScsiDeviceIO: 2326: Cmd(0x41254073d0c0) 0x28, CmdSN 0x8000002c from world 11725 to dev “naa.600601601b7034000e3e97350668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:45.584Z cpu71:8263)ScsiDeviceIO: 2326: Cmd(0x412541021780) 0x28, CmdSN 0x80000067 from world 11725 to dev “naa.600601601b7034000e3e97350668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:45.803Z cpu63:8255)ScsiDeviceIO: 2326: Cmd(0x412540d20480) 0x28, CmdSN 0x80000019 from world 11725 to dev “naa.600601601b703400d24fc7620668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:46.253Z cpu74:8266)ScsiDeviceIO: 2326: Cmd(0x412540b96380) 0x28, CmdSN 0x8000006f from world 11725 to dev “naa.600601601b7034005e918f5b0668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
 
The Host side error (H:0x7) literally translates to Storage Initiator Error, which makes it sounds like there is something physical wrong with the card. One needs to understand that this status is sent up the stack from the HBA driver so really it is up to the those that write the driver to use this status for certain conditions. As there are no accompanying errors from the HBA driver, which in this case is a Brocade HBA, this is all we have to work with without enabling verbose logging in the driver. Verbose logging requires a reboot so this is not always an option when investigating root cause. The exception would be that the issue in ongoing so rebooting a host to capture this data is a viable option.
 
Taking a LUN as an example from ‘esxcfg-mpath -b’ output to get a view of the paths and targets:
 
# esxcfg-mpath -b
 
naa.600601601b703400b6aa124c0668e311 : DGC Fibre Channel Disk (naa.600601601b703400b6aa124c0668e311)
   vmhba0:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9a WWPN: 20:01:74:86:7a:ae:1c:9a  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:63:47:20:7a:a8
   vmhba1:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9c WWPN: 20:01:74:86:7a:ae:1c:9c  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:60:47:24:7a:a8
   vmhba0:C0:T1:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9a WWPN: 20:01:74:86:7a:ae:1c:9a  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:6b:47:20:7a:a8
   vmhba1:C0:T2:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9c WWPN: 20:01:74:86:7a:ae:1c:9c  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:68:47:24:7a:a8
   vmhba2:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:32 WWPN: 20:01:74:86:7a:ae:1c:32  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:63:47:20:7a:a8
   vmhba3:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:34 WWPN: 20:01:74:86:7a:ae:1c:34  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:60:47:24:7a:a8
   vmhba2:C0:T1:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:32 WWPN: 20:01:74:86:7a:ae:1c:32  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:6b:47:20:7a:a8
   vmhba3:C0:T2:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:34 WWPN: 20:01:74:86:7a:ae:1c:34  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:68:47:24:7a:a8
 
Let’s look at the adapter statistics for all HBAs. I would recommend always using localcli over esxcli when troubleshoot as esxcli requires hostd to be functioning properly:
 
# localcli storage core adapter stats get
 
vmhba0:
   Successful Commands: 844542177
   Blocks Read: 243114868277
   Blocks Written: 25821448417
  Read Operations: 395494703
   Write Operations: 405753901
   Reserve Operations: 0
   Reservation Conflicts: 0
   Failed Commands: 35403
   Failed Blocks Read: 57744
   Failed Blocks Written: 16843
   Failed Read Operations: 8224
   Failed Write Operations: 16450
   Failed Reserve Operations: 0
   Total Splits: 0
   PAE Commands: 0
 
vmhba1:
   Successful Commands: 502595840 <– Far less successful commands than the other adapters
   Blocks Read: 116436597821
   Blocks Written: 16509939615
   Read Operations: 216572537
   Write Operations: 245276523
   Reserve Operations: 0
   Reservation Conflicts: 0
   Failed Commands: 10942696
   Failed Blocks Read: 12055379188 <– 12 billion failed blocks read! Other adapters are all less than 60,000
   Failed Blocks Written: 933809
   Failed Read Operations: 10895926
   Failed Write Operations: 25645
   Failed Reserve Operations: 0
   Total Splits: 0
   PAE Commands: 0
 
vmhba2:
   Successful Commands: 845976973
   Blocks Read: 244034940187
   Blocks Written: 26063852941
   Read Operations: 397564994
   Write Operations: 407538414
   Reserve Operations: 0
   Reservation Conflicts: 0
   Failed Commands: 40468
   Failed Blocks Read: 44157
   Failed Blocks Written: 18676
   Failed Read Operations: 5506
   Failed Write Operations: 12152
   Failed Reserve Operations: 0
   Total Splits: 0
   PAE Commands: 0
 
vmhba3:
   Successful Commands: 866718515
   Blocks Read: 249837164491
   Blocks Written: 26492209531
   Read Operations: 406367844
   Write Operations: 416901703
   Reserve Operations: 0
   Reservation Conflicts: 0
   Failed Commands: 37723
   Failed Blocks Read: 23191
   Failed Blocks Written: 139380
   Failed Read Operations: 7372
   Failed Write Operations: 14878
   Failed Reserve Operations: 0
   Total Splits: 0
   PAE Commands: 0
 
 
Let’s see how often the vmkernel.log reports messages for that HBA:
 
# cat vmkernel.log |grep vmhba0|wc -l
112
 
# cat vmkernel.log |grep vmhba1|wc -l
8474 <– over 8000 times this HBA is mentioned! This doesn’t mean they are all errors, of course, but based on the log spew we know is already occurring it means it likely is
 
# cat vmkernel.log |grep vmhba2|wc -l
222
 
# cat vmkernel.log |grep vmhba3|wc -l
335
 
Now let’s take a look at the zoning to see if multiple adapters are zoned to the exact same array targets (WWPN) in attempt to determine if the issue is possibly array side or HBA side:
 
# esxcfg-mpath -b
 
naa.600601601b703400b6aa124c0668e311 : DGC Fibre Channel Disk (naa.600601601b703400b6aa124c0668e311)
   vmhba0:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9a WWPN: 20:01:74:86:7a:ae:1c:9a  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:63:47:20:7a:a8
   vmhba1:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9c WWPN: 20:01:74:86:7a:ae:1c:9c  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:60:47:24:7a:a8
   vmhba0:C0:T1:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9a WWPN: 20:01:74:86:7a:ae:1c:9a  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:6b:47:20:7a:a8
   vmhba1:C0:T2:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9c WWPN: 20:01:74:86:7a:ae:1c:9c  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:68:47:24:7a:a8
   vmhba2:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:32 WWPN: 20:01:74:86:7a:ae:1c:32  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:63:47:20:7a:a8
   vmhba3:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:34 WWPN: 20:01:74:86:7a:ae:1c:34  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:60:47:24:7a:a8
   vmhba2:C0:T1:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:32 WWPN: 20:01:74:86:7a:ae:1c:32  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:6b:47:20:7a:a8
   vmhba3:C0:T2:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:34 WWPN: 20:01:74:86:7a:ae:1c:34  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:68:47:24:7a:a8
 
Let’s isolate the HBAs so they are easier to visually compare the WWPN of the array targets:
 
vmhba1:
 
   vmhba1:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9c WWPN: 20:01:74:86:7a:ae:1c:9c  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:60:47:24:7a:a8
   vmhba1:C0:T2:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9c WWPN: 20:01:74:86:7a:ae:1c:9c  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:68:47:24:7a:a8
 
vmhba3:
 
   vmhba3:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:34 WWPN: 20:01:74:86:7a:ae:1c:34  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:60:47:24:7a:a8
   vmhba3:C0:T2:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:34 WWPN: 20:01:74:86:7a:ae:1c:34  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:68:47:24:7a:a8
 
vmhba1 and vmhba3 are zoned to the exact same array ports yet only vmhba1 is experiencing communication issues/errors.
 
 
Let’s look at the driver information under /proc/scsi/bfa/ by viewing (cat) the node information:
 
Chip Revision: Rev-E
Manufacturer: Brocade
Model Description: Brocade-1741
Instance Num: 0
Serial Num: xxxxxxxxx32
Firmware Version: 3.2.3.2
Hardware Version: Rev-E
Bios Version: 3.2.3.2
Optrom Version: 3.2.3.2
Port Count: 2
WWNN: 20:00:74:86:7a:ae:1c:9a
WWPN: 20:01:74:86:7a:ae:1c:9a
Instance num: 0
Target ID: 0 WWPN: 50:06:01:6b:47:20:7b:04
Target ID: 1 WWPN: 50:06:01:6b:47:20:7a:a8
Target ID: 2 WWPN: 50:06:01:63:47:20:7b:04
Target ID: 3 WWPN: 50:06:01:63:47:20:7a:a8
 
Chip Revision: Rev-E
Manufacturer: Brocade
Model Description: Brocade-1741
Instance Num: 1
Serial Num: xxxxxxxxx32
Firmware Version: 3.2.3.2
Hardware Version: Rev-E
Bios Version: 3.2.3.2
Optrom Version: 3.2.3.2
Port Count: 2
WWNN: 20:00:74:86:7a:ae:1c:9c
WWPN: 20:01:74:86:7a:ae:1c:9c
Instance num: 1
Target ID: 0 WWPN: 50:06:01:60:47:24:7b:04
Target ID: 1 WWPN: 50:06:01:68:47:24:7b:04
Target ID: 3 WWPN: 50:06:01:60:47:24:7a:a8
Target ID: 2 WWPN: 50:06:01:68:47:24:7a:a8
 
Chip Revision: Rev-E
Manufacturer: Brocade
Model Description: Brocade-1741
Instance Num: 2
Serial Num: xxxxxxxxx2E
Firmware Version: 3.2.3.2
Hardware Version: Rev-E
Bios Version: 3.2.3.2
Optrom Version: 3.2.3.2
Port Count: 2
WWNN: 20:00:74:86:7a:ae:1c:32
WWPN: 20:01:74:86:7a:ae:1c:32
Instance num: 2
Target ID: 0 WWPN: 50:06:01:6b:47:20:7b:04
Target ID: 1 WWPN: 50:06:01:6b:47:20:7a:a8
Target ID: 2 WWPN: 50:06:01:63:47:20:7b:04
Target ID: 3 WWPN: 50:06:01:63:47:20:7a:a8
 
Chip Revision: Rev-E
Manufacturer: Brocade
Model Description: Brocade-1741
Instance Num: 3
Serial Num: xxxxxxxxx2E
Firmware Version: 3.2.3.2
Hardware Version: Rev-E
Bios Version: 3.2.3.2
Optrom Version: 3.2.3.2
Port Count: 2
WWNN: 20:00:74:86:7a:ae:1c:34
WWPN: 20:01:74:86:7a:ae:1c:34
Instance num: 3
Target ID: 0 WWPN: 50:06:01:60:47:24:7b:04
Target ID: 1 WWPN: 50:06:01:68:47:24:7b:04
Target ID: 2 WWPN: 50:06:01:68:47:24:7a:a8
Target ID: 3 WWPN: 50:06:01:60:47:24:7a:a8
 
So all HBAs are the same firmware, which is important from a observed consistency perspective. Had the firmware versions been different then there might be something to go on, or at least verify whether there are issues with that firmware level. Obviously they are using the same driver as well since only one is loaded in the kernel.
 
We can see not only by the shared serial number above but also by the lspci output that these are 2 port physical cards:
 
# lspci
 
000:007:00.0 Serial bus controller: Brocade Communications Systems, Inc. Brocade-1010/1020/1007/1741 [vmhba0]
000:007:00.1 Serial bus controller: Brocade Communications Systems, Inc. Brocade-1010/1020/1007/1741 [vmhba1]
000:009:00.0 Serial bus controller: Brocade Communications Systems, Inc. Brocade-1010/1020/1007/1741 [vmhba2]
000:009:00.1 Serial bus controller: Brocade Communications Systems, Inc. Brocade-1010/1020/1007/1741 [vmhba3]
 
The first set of numbers are read as Domain:Bus:Slot.Function so vmhba0 and vmhba1 are both on Domain 0, Bus 7, Slot 0, amd function 0 and 1 respectively, which means it is a dual port HBA.
 
So vmhba0 and vmhba1 are the same physical card yet only vmhba1 is showing errors. The HBA chips themselves on a dual port HBA are mostly independent of each other so at least this means there isn’t a problem with the board or circuitry they both share. I say mostly since the physical ports are independent of each other as well as the HBA chip however they do share the same physical board and connection on the motherboard.
 
This host is running EMC PowerPath VE so we know that in general the I/O loads is evenly distributed across all HBAs and paths evenly. I say in general as PowerPath VE is intelligent enough to use paths that exhibit more errors than other paths less frequently, as well as paths that are more latent.
 
I believe we may be looking at either a cable issue (loose, faulty, or bad GBIC) between vmhba1 and the switch or the switch port itself that vmhba1 is connected to. Here is why:
 
1. vmhba1 is seeing thousands upon thousands of errors while the other HBAs are very quiet
2. vmhba1 and vmhba3 are zoned to the exact same targets yet only vmhba1 is seeing errors
3. vmhba0 and vmhba1 are the same physical card yet only vmhba 1 is seeing errors
 
My recommendation would be to check the physical switch port error counters and possibly replace the cable to see if the errors subside. It is standard practice to reset the switch counters and monitor to ensure errors are still happening so may be needed to do that to validate that the CRC errors or other fabric errors are still occurring.
 
Cheers,
Nathan (twitter account: @vSphereStorage)