Home > Blogs > Support Insider > Category Archives: Highlights

Category Archives: Highlights

Path failover may not be successful when using Cisco MDS Switches on NX-OS 7.3 and FCoE based HBAs

So I wanted to get this blog post out sooner rather than later as it might effect a significant number of customers. In a nutshell, if you perform array maintenance that requires you to reboot a storage controller, the probability of successful path failover is low. This is effectively due to stale entries in the fiber channel name server on Cisco MDS switches running NX-OS 7.3, which is a rather new code release. As the title suggests, this only affects FCoE HBAs, specifically ones that rely on our libfc/libfcoe stack for FCoE connectivity. Such HBAs would be Cisco fnic HBAs as well as a handful of Emulex FCoE HBAs and a couple others.

Here is an example of a successful path failover after receiving an RSCN (Register State Change Notification) from the array controller after performing a reboot:

2016-07-07T17:36:34.230Z cpu17:33461)<6>host4: disc: Received an RSCN event
 2016-07-07T17:36:34.230Z cpu17:33461)<6>host4: disc: Port address format for port (e50800)
 2016-07-07T17:36:34.230Z cpu17:33461)<6>host4: disc: RSCN received: not rediscovering. redisc 0 state 9 in_prog 0
 2016-07-07T17:36:34.231Z cpu14:33474)<6>host4: disc: GPN_ID rejected reason 9 exp 1
 2016-07-07T17:36:34.231Z cpu14:33474)<6>host4: rport e50800: Remove port
 2016-07-07T17:36:34.231Z cpu14:33474)<6>host4: rport e50800: Port entered LOGO state from Ready state
 2016-07-07T17:36:34.231Z cpu14:33474)<6>host4: rport e50800: Delete port
 2016-07-07T17:36:34.231Z cpu54:33448)<6>host4: rport e50800: work event 3
 2016-07-07T17:36:34.231Z cpu54:33448)<7>fnic : 4 :: fnic_rport_exch_reset called portid 0xe50800
 2016-07-07T17:36:34.231Z cpu54:33448)<7>fnic : 4 :: fnic_rport_reset_exch: Issuing abts
 2016-07-07T17:36:34.231Z cpu54:33448)<6>host4: rport e50800: Received a LOGO response closed
 2016-07-07T17:36:34.231Z cpu54:33448)<6>host4: rport e50800: Received a LOGO response, but in state Delete
 2016-07-07T17:36:34.231Z cpu54:33448)<6>host4: rport e50800: work delete

Here is a breakdown of what you just read:

  1. RSCN is received from the array controller
  2. Operation is now is state = 9
  3. GPN_ID (Get Port Name ID) is issued to the switches but is rejected because the state is 9 (See http://lists.open-fcoe.org/pipermail/fcoe-devel/2009-June/002828.html)
  4. LibFC begins to remove the port information on the host
  5. Port enters LOGO (Logout) state from previous state, which was Ready
  6. LibFC Deletes the port information

After this the ESX host will failover to other available ports, which would be on the peer SP:

2016-07-07T17:36:44.233Z cpu33:33459)<3> rport-4:0-1: blocked FC remote port time out: saving binding
 2016-07-07T17:36:44.233Z cpu55:33473)<7>fnic : 4 :: fnic_terminate_rport_io called wwpn 0x524a937aeb740513, wwnn0xffffffffffffffff, rport 0x0x4309b72f3c50, portid 0xffffffff
 2016-07-07T17:36:44.257Z cpu52:33320)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x2a (0x43a659d15bc0, 36277) to dev "naa.624a93704d1296f5972642ea0001101c" on path "vmhba3:C0:T0:L1" Failed: H:0x1 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:FAILOVER

A Host status of H:0x1 means NO_CONNECT, hence the failover.

Now here is an example of the same operation on a Cisco MDS switch running NX-OS 7.3 when a storage controller on the array is rebooted:

2016-07-14T19:02:03.551Z cpu47:33448)<6>host2: disc: Received an RSCN event
 2016-07-14T19:02:03.551Z cpu47:33448)<6>host2: disc: Port address format for port (e50900)
 2016-07-14T19:02:03.551Z cpu47:33448)<6>host2: disc: RSCN received: not rediscovering. redisc 0 state 9 in_prog 0
 2016-07-14T19:02:03.557Z cpu47:33444)<6>host2: rport e50900: ADISC port
 2016-07-14T19:02:03.557Z cpu47:33444)<6>host2: rport e50900: sending ADISC from Ready state
 2016-07-14T19:02:23.558Z cpu47:33448)<6>host2: rport e50900: Received a ADISC response
 2016-07-14T19:02:23.558Z cpu47:33448)<6>host2: rport e50900: Error 1 in state ADISC, retries 0
 2016-07-14T19:02:23.558Z cpu47:33448)<6>host2: rport e50900: Port entered LOGO state from ADISC state
 2016-07-14T19:02:43.560Z cpu2:33442)<6>host2: rport e50900: Received a LOGO response timeout
 2016-07-14T19:02:43.560Z cpu2:33442)<6>host2: rport e50900: Error -1 in state LOGO, retrying
 2016-07-14T19:02:43.560Z cpu58:33446)<6>host2: rport e50900: Port entered LOGO state from LOGO state
 2016-07-14T19:03:03.563Z cpu54:33449)<6>host2: rport e50900: Received a LOGO response timeout
 2016-07-14T19:03:03.563Z cpu54:33449)<6>host2: rport e50900: Error -1 in state LOGO, retrying
 2016-07-14T19:03:03.563Z cpu2:33442)<6>host2: rport e50900: Port entered LOGO state from LOGO state
 2016-07-14T19:03:23.565Z cpu32:33447)<6>host2: rport e50900: Received a LOGO response timeout
 2016-07-14T19:03:23.565Z cpu32:33447)<6>host2: rport e50900: Error -1 in state LOGO, retrying
 2016-07-14T19:03:23.565Z cpu54:33449)<6>host2: rport e50900: Port entered LOGO state from LOGO state
 2016-07-14T19:03:43.567Z cpu50:33445)<6>host2: rport e50900: Received a LOGO response timeout
 2016-07-14T19:03:43.567Z cpu50:33445)<6>host2: rport e50900: Error -1 in state LOGO, retrying
 2016-07-14T19:03:43.567Z cpu32:33447)<6>host2: rport e50900: Port entered LOGO state from LOGO state
 2016-07-14T19:04:03.568Z cpu54:33443)<6>host2: rport e50900: Received a LOGO response timeout
 2016-07-14T19:04:03.568Z cpu54:33443)<6>host2: rport e50900: Error -1 in state LOGO, retrying
 2016-07-14T19:04:03.569Z cpu32:33472)<6>host2: rport e50900: Port entered LOGO state from LOGO state
 2016-07-14T19:04:43.573Z cpu20:33473)<6>host2: rport e50900: Received a LOGO response timeout
 2016-07-14T19:04:43.573Z cpu20:33473)<6>host2: rport e50900: Error -1 in state LOGO, retrying
 2016-07-14T19:04:43.573Z cpu54:33443)<6>host2: rport e50900: Port entered LOGO state from LOGO state

Notice the difference? Here is a breakdown of what happened this time:

  1. RSCN is received from the array controller
  2. Operation is now is state = 9
  3. GPN_ID (Get Port Name ID) is issued to the switches but is NOT rejected
  4. Since GPN_ID is valid, LibFC issues an Address Discovery (ADISC)
  5. 20 seconds later the ADISC sent times out and this continues to occur every 20 seconds

The problem is that the ADISC will continue this behavior until the array controller completes the reboot and is back online:

2016-07-14T19:04:47.276Z cpu56:33451)<6>host2: disc: Received an RSCN event
 2016-07-14T19:04:47.276Z cpu56:33451)<6>host2: disc: Port address format for port (e50900)
 2016-07-14T19:04:47.276Z cpu56:33451)<6>host2: disc: RSCN received: not rediscovering. redisc 0 state 9 in_prog 0
 2016-07-14T19:04:47.277Z cpu20:33454)<6>host2: rport e50900: Login to port
 2016-07-14T19:04:47.277Z cpu20:33454)<6>host2: rport e50900: Port entered PLOGI state from LOGO state
 2016-07-14T19:04:47.278Z cpu57:33456)<6>host2: rport e50900: Received a PLOGI accept
 2016-07-14T19:04:47.278Z cpu57:33456)<6>host2: rport e50900: Port entered PRLI state from PLOGI state
 2016-07-14T19:04:47.278Z cpu52:33458)<6>host2: rport e50900: Received a PRLI accept
 2016-07-14T19:04:47.278Z cpu52:33458)<6>host2: rport e50900: PRLI spp_flags = 0x21
 2016-07-14T19:04:47.278Z cpu52:33458)<6>host2: rport e50900: Port entered RTV state from PRLI state
 2016-07-14T19:04:47.278Z cpu57:33452)<6>host2: rport e50900: Received a RTV reject
 2016-07-14T19:04:47.278Z cpu57:33452)<6>host2: rport e50900: Port is Ready

What is actually happening here is that the Cisco MDS switches are quick to receive the RSCN from the array controller and pass it along to the host HBAs however due to a timing issue the entries for that array controller in the FCNS (Fiber Channel Name Server) database are still present when the host HBAs issue the GPN_ID so the switches respond to that request instead of rejecting it. If you review the entry in http://lists.open-fcoe.org/pipermail/fcoe-devel/2009-June/002828.html you see that code was added to validate that the target is actually off the fabric instead of assuming it would be by the RSCN alone. There are various reasons to do this but suffice it to say that it is better to be safe than sorry in this instance.

Unfortunately there is no fix for this at this time, which is why this is potentially so impactful to our customers since it means they effectively are unable to perform array maintenance without the risk of VMs crashing or even corruption. Cisco is fixing this in 7.3(1), which due out in a few weeks.

Here are a couple of references regarding this issue:

 

Cheers,
Nathan Small
Technical Director
Global Support Services
VMware

New and updated KB articles for NSX for vSphere 6.2.3

NSXWe’ve just released the bits for NSX for vSphere 6.2.3 and thought all of your making the upgrade would want to be on top of all the ins and outs of this release.

Here is a list of new and/or updated articles in our Knowledgebase:
NSX for vSphere 6.2.3   |   Released 09 June 2016   |   Build 3979471

Of course, do not miss the release notes, which can be found here: NSX for vSphere 6.2.3 Release Notes

Top 40 Horizon View articles for April 2016

Here is an update for the top Horizon View articles list for April 2016. This list is ranked by the number of times a VMware Support Request was resolved by following the steps in a published Knowledge Base article.

  1. Manually deleting linked clones or stale virtual desktop entries from the View Composer database in VMware View Manager and VMware Horizon View
  2. Using Microsoft Certreq to generate signed SSL certificates in VMware Horizon View
  3. VMware View ports and network connectivity requirements
  4. Removing invalid linked clone entries automatically using the ViewDBChk tool in VMware Horizon View 5.3 and later versions
  5. Provisioning View desktops fails due to customization timeout errors
  6. Linked Clone pool creation and recomposition fails with VMware Horizon View 6.1.x and older releases
  7. Finding and removing unused replica virtual machines in the VMware Horizon View
  8. Configure security protocols for PCoIP for Horizon 6 version 6.2 and later, and Horizon Client 3.5 and later
  9. Removing a standard (replica) connection server or a security server from a cluster of connection/security servers
  10. Restart order of the View environment to clear ADLDS (ADAM) synchronization in View 4.5, 4.6, 5.0, 5.1, 5.2, 5.3, 6.0, and 6.1
  11. Recommended restart cycle of the VMware Horizon View environment
  12. Troubleshooting SSL certificate issues in VMware Horizon View 5.1 and later
  13. Manually deleting replica virtual machines in VMware Horizon View 5.x
  14. Provisioning VMware Horizon View desktops fails with error: View Composer Agent initialization error (16): Failed to activate software license
  15. Black screen when logging in to a Horizon View virtual desktop
  16. Generating a Horizon View SSL certificate request using the Microsoft Management Console (MMC) Certificates snap-in
  17. Recompute virtual disk digest tasks storm in vCenter Server
  18. Performing an end-to-end backup and restore for VMware View Manager
  19. Forcing replication between ADAM databases
  20. View Connection Server fails to replicate
  21. The View virtual machine is not accessible and the View Administration console shows the virtual machine status as Already Used
  22. Configuring USB device filters and redirecting Human Interface Devices (HIDs) in View Client
  23. Poor virtual machine application performance may be caused by processor power management settings
  24. Network adapter disappears from a Windows virtual machine
  25. Changing the log file behavior in the VMware View components
  26. Troubleshooting USB redirection problems in VMware View Manager
  27. Administration dashboard in VMware Horizon View reports the error: Server’s certificate cannot be checked
  28. Connecting to the View ADAM Database
  29. View Manager Admin console displays the error: Error during provisioning: Unexpected VC fault from View Composer (Unknown)
  30. Troubleshooting and Frequently Asked Questions for space reclamation in VMware Horizon View 5.2.x and 5.3.x
  31. Troubleshooting VMware Horizon View HTML Access
  32. VMware View SVGA driver reports newer version than the installed View Agent version
  33. Troubleshooting Agent Unreachable status in VMware Horizon View
  34. Intermittent provisioning issues and generic errors when Composer and vCenter Server are co-installed
  35. Disabling the HotAdd/HotPlug capability in ESXi 5.x and ESXi/ESX 4.x virtual machines
  36. Troubleshooting frequent PCoIP disconnects in a VMware View environment
  37. Increasing the video memory on VMware View desktops
  38. Determining which replica disk is used for a linked clone virtual machine
  39. VMware View Administrator shows the virtual machine status error: Configuration Error
  40. View Connection Server reports the error: [ws_TomcatService] STDOUT: java.lang.OutOfMemoryError: Java heap space

Inventory objects fail to display in the vSphere Web Client

An issue recently causing a few calls into our support lines happens after installing or upgrading to vCenter Server 6.0. The issue also impacts the vCenter appliance and is not specific to Windows. The issue occurs in the vSphere Web Client due to caching within the VMware Inventory Service.

  • When browsing inventory under Host and Clusters, some or no object are displayed
  • When using the Search function in the vSphere Web Client the object is found
  • The Issue does not occur in the vSphere Client
  • When browsing inventory under Related Objects at multiple levels you see this message:
no object found

If you suspect you have run into this head on over to KB: In the vSphere Web Client 6.0 inventory objects fail to display (2144934) for more details on what to check in your logs and what can be done about it.

 

Inventory objects

Important KB updates for current NSX for vSphere users -May 2016 Edition

NSXOur NSX support team would like all of our customers to know about important KB updates for current NSX for vSphere issues. Here’s what’s new and trending-

Please take note of key updates to the following important End of General Support and End of Availability events:

New and important issues:

NSX for Multi-Hypervisor:

New master playbook KBs:

How to track the top field issues:

 

User account locked in vCenter Server Appliance

vCSAWe’ve recently noticed a number of cases where vSphere administrators become locked out of their accounts or receive reports of incorrect passwords in the vCenter Server Appliance. If you find yourself in this position, here are two articles that address these issues:

KB 2034608
When attempting to log into the VMware vSphere 5.1, 5.5, or 6.0 Web Client you observe the following symptom: “User account is locked. Please contact your administrator.” This often occurs if the wrong password was entered multiple times. Waiting the default 15 minutes lockout period will allow to attempt the login again. If after multiple attempts, you are still not successful, you may need to reset the password.

KB 2069041
When attempting to log into the vCenter Server 5.5 and 6.0 Appliance, you experience symptoms where the root account is locked out. This often occurs because the vCenter Server appliance has a default 90 password expiration policy. Steps on how to modify the password expiration policies and to unlock the password.

Important NSX for vSphere KB Updates – March 2016

vCloud Networking and Security will reach end of availability and end of support on September 19, 2016.

  • KB 2144733 – End of Availability and End of Support Life for VMware vCloud Networking and Security 5.5.x
  • See the fully updated vCNS to NSX Upgrade Guide
  • See also KB 2144620 – VMware vCloud Networking and Security 5.5.x upgrade to NSX for vSphere 6.2.x Best Practices
  • ​Upgrade path from vCNS 5.x: Using the NSX upgrade bundle posted on or after 31 March, 2016, you may upgrade directly from vCNS 5.1.x or vCNS 5.5.x to NSX 6.2.2 Please see the NSX 6.2.2 release notes
  • Upgrades from NSX 6.1.5 to NSX 6.2.0 are not supported. Instead, you must upgrade from NSX 6.1.5 to NSX 6.2.1 or later to avoid a regression in functionality. Refer to KB 2129200

NSX for vSphere 6.1.x will reach end of availability and end of support on October 15, 2016

  • KB 2144769End of Availability and End of Support Life for VMware NSX for vSphere 6.1.x
  • The recommended release for NSX-V is 6.2.2. Refer to KB 2144295Recommended minimum version for VMware NSX for vSphere with Guest Introspection Driver, ESXi and vCenter server.

New issues:

  • KB 2144726Service Composer fails to translate virtual machines into security-groups in VMware NSX for vSphere 6.x
  • KB 2140891Storage vMotion of Edge appliance disrupts VIX communication in VMware vCloud Networking and Security 5.5.x and NSX for vSphere 6.x
  • KB 2144476After reinstalling vCenter Server 6.0 EAM fails to push VIB’s to ESXi host with the error: Host not covered by scope anymore
  • KB 2144456Importing draft firewall rules fails after existing firewall configuration is removed by a REST API request
  • KB 2144387After upgrading to VMware NSX for vSphere 6.2.2 there is no upgrade option available for NSX Guest Introspection and NSX Data Security and the services remain at version 6.2.1
  • KB 2144420Any changes to the Primary UDLR result in the vNic_0 being shutdown on the Seconday UDLR in VMware Cross-vCenter NSX for vSphere 6.2.x
  • KB 2144236VMtools issue – Virtual machine performance issues after upgrading VMware tools version to 10.0.x in NSX/ VMware vCloud Networking and Security 5.5.x
  • KB 2144649 – IPv4 IP address do not get auto approved when SpoofGuard policy is set to Trust On First Use (TOFU) in VMware NSX for vSphere 6.2.x
  • KB 2144732 – In VMware NSX for vSphere 6.x, unpreparing Stateless ESXi host fails with the error: Agent VIB module is not installed. Cause : 15 The installation transaction failed. The transaction is not supported
  • KB 2135956 – VMware ESXi 6.0 Update 1 host fails with a purple diagnostic screen and reports the error: PANIC bora/vmkernel/main/dlmalloc.c:4923 – Usage error in dlmallocnow resolved in vSphere 6.0U2. See also the vSphere 6.0U2 Release Notes
  • KB 2126275Publishing Distributed Firewall (DFW) rules fails after referenced object is deleted in VMware NSX for vSphere 6.1.x and 6.2.x

Tracking the top issues:

View desktop goes into Already Used state, or Agent Disabled state

In versions 4.6 through 5.0.x View desktops can also go into the Already Used state if a virtual machine is powered on in another ESXi host in the cluster in response to an HA event, or if it was shut down without reporting to the broker that the user had logged out.

This is due to a security feature which prevents any previous session data from being available during the next log in. If a desktop that is set to refresh or delete after log off is reset, the desktop goes into the Already Used state, or possibly the Agent Disabled state.

If you run into this situation we have a KB article that covers this scenario: The View virtual machine is not accessible and the View Administration console shows the virtual machine status as “Already Used” (1000590)

PCoIP connections suddenly failing?

There is an issue which may occur when you install or upgrade your VMware Tools after having installed your View Agent. The set of VGA drivers shipped with VMware Tools might sometimes be incompatible with VMware View and PCoIP, whereas the VMware View Agent software contains a supported VGA driver.

For this reason, today we’re highlighting the KB article we have written up for this issue: Error attaching to SVGADevTap, error 4000: EscapeFailed reported by PCoIP server (1029706)

To resolve this issue, you must update the drivers to the version supplied with VMware View Agent. The KB also includes a nice table of video drivers supplied with different VMware View Agent versions and operating systems.

Dealing with issues with View Event Database

We’d like to focus today on a KB article which tries to address all the issues encountered by the users with regard to View Event Database deployment. The article includes a list of basic steps that will help you address all the issues that might be encountered during or after your deployment.

Configuring VMware View Event database on an SQL server fails with the error: An error occurred while attempting to configure the database (1029537)

Users actions are recorded about View Manager events. If this database is not configured you would need to look in the log files to gather information about events, and the logs contain very limited information.

Configuration of the event database will fail if the user has not met the prerequisites required. Our KB article helps users meet all the required prerequisites needed, from network connectivity, to SQL server, to correct credentials for database accounts, ports and firewall status.

If you find your issue persists after validating all the 12 steps mentioned in the article, then the potential cause for the issue might be an older version of the JDBC driver. Newer versions have a number of bug fixes for SSL certificate issues. Following the steps to install new version of JDBC driver should address all the issue and completes the configuration of VMware View event database successfully. Of course, if you’re unsure or still struggling, by all means contact our support team.