Home > Blogs > Support Insider

Important Announcement regarding the KB Digest blog

We have an important announcement today regarding this blog and the KB Digest blog.

Since 2007, our sister blog- The KB Digest has been published weekly, containing all the new KBs published by product the previous week.

What is being changed, and why?

The KB Digest is being merged into this blog, under the category KB Digest. With one central location for all Support communications with our customers we expect to be able to build our brand and broaden our audience.

Specifics:

1. Starting Jan 16 2017 no new content will be published to The KB Digest blog. Current content will remain active for one month after which the blog will be removed.
2. New posts will be found at The Support Insider, under the category KB Digest.
3. RSS subscribers will continue to receive updates uninterrupted (ie: no changes to your subscription are needed).

How to revert an ESXi host to a previous version using DCUI

This video demonstrates how to revert an ESXi host to a previous version using the Direct User Console Interface (DCUI).

VMware recommends that you back up the configuration data before proceeding with any changes.

Reverting an ESXi host is only available if the host was updated using one of the following methods:

  • VIB installation or removal
  • Profile installation or removal
  • ESXi host updated using VMware Update Manager
  • ESXi host updated from a ISO

Video:

A look at All Paths Down in vSphere

Karthick SivaramakrishnanToday we have a guest post from Karthick Sivaramakrishnan, who is a 3 year veteran at VMware. His primary field of expertise is vSphere Storage and Site Recovery Manager.

This blog post is centered around how ESXi handles unscheduled storage disconnects on vSphere 5.x and 6.x. An unscheduled storage disconnect means some issue in the vSphere environment has led to All-Paths-Down (APD) for a datastore.  An APD situation will be seen when ESXi host does not have any path to communicate with a lun on the storage array.

ESXi host can encounter an APD under several conditions. As a result, we may end up having VMs running on a given datastore go down, the host could get disconnected from vCenter, and in worst cases ESXi could become unresponsive.

From vSphere version 5.x and onwards, we are able to discern whether a disconnect is permanent or transient. Ideally a transient disconnect leads to All Paths Down state and ESXi expects the device to have a temporary disconnect. When we see permanent device loss or PDL the device is expected to have a non-recoverable issue like a hardware error or the lun is unmapped.

In the below example we see all iSCSI datastores are in inactive state.

Datastores

To determine what caused this issue we see ESXi logs, particularly vmkernel and vobd. This issue will be evident in the vmkernel logs.

vmkernel log

2017-01-10T13:04:26.803Z cpu1:32896)StorageApdHandlerEv: 110: Device or filesystem with identifier [naa.6000eb31dffdc33a0000000000000028] has entered the All Paths Down state.

2017-01-10T13:04:26.818Z cpu0:32896)StorageApdHandlerEv: 110: Device or filesystem with identifier [naa.6000eb31dffdc33a000000000000002a] has entered the All Paths Down state.

vobd log

2017-01-10T13:04:26.905Z: [scsiCorrelator] 475204262us: [esx.problem.storage.connectivity.lost] Lost connectivity to storage device naa.6000eb31dffdc33a0000000000000028. Path vmhba33:C0:T1:L0 is down. Affected datastores: “Green”.

2017-01-10T13:04:26.905Z: [scsiCorrelator] 475204695us: [esx.problem.storage.connectivity.lost] Lost connectivity to storage device naa.6000eb31dffdc33a000000000000002a. Path vmhba33:C0:T0:L0 is down. Affected datastores: “Grey”.

From these logs we understand that ESXi host has lost connectivity to the datastore. Any virtual machines using the affected datastore may become unresponsive. In this example while the datastores was mounted on ESXi, we lost the network uplink on the nic that was used for iSCSI connection. This was a transient issue and the datastore came up once the network uplink was restored.

In the below example we see Datastore Black is in inactive state.

Datastore view missing

If we look into the logs to determine whats going on we see these events.

Vmkernel.log

2017-01-09T12:42:09.365Z cpu0:32888)ScsiDevice: 6878: Device naa.6000eb31dffdc33a0000000000000063 APD Notify PERM LOSS; token num:1

2017-01-09T12:42:09.366Z cpu1:32916)StorageApdHandler: 1066: Freeing APD handle 0x430180b88880 [naa.6000eb31dffdc33a0000000000000063]

2017-01-09T12:49:01.260Z cpu1:32786)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0xc1) PDL error (0x5/0x25/0x0) – path vmhba33:C0:T3:L0 device naa.6000eb31dffdc33a0000000000000063 – triggering path evaluation

2017-01-09T12:49:01.260Z cpu1:32786)ScsiDeviceIO: 2651: Cmd(0x439d802ec580) 0xfe, CmdSN 0x4b7 from world 32776 to dev “naa.6000eb31dffdc33a0000000000000063” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.

2017-01-09T12:49:01.300Z cpu0:40210)WARNING: NMP: vmk_NmpSatpIssueTUR:1043: Device naa.6000eb31dffdc33a0000000000000063 path vmhba33:C0:T3:L0 has been unmapped from the array

After some time passes you will see this message:

2017-01-09T13:13:11.942Z cpu0:32872)ScsiDevice: 1718: Permanently inaccessible device :naa.6000eb31dffdc33a0000000000000063 has no more open connections. It is now safe to unmount datastores (if any) and delete the device.

In this case the lun was unmapped from the array for this host and that is not a transient issue. Sens data 0x5 0x25 0x0 corresponds to “LOGICAL UNIT NOT SUPPORTED” which indicates the device is in Permanent Device Loss (PDL) state. Once ESXi knows the device is in PDL state it does not wait for the device to return back.

ESXi only checks ASC/ASCQ and if it happens to be 0x25/0x0 or  0x68/0x0, it marks device as PDL.

VMware KB 2004684 has in-depth information around APD and PDL situations. It also talks about planned and unplanned PDL. You can read it here: Permanent Device Loss (PDL) and All-Paths-Down (APD) in vSphere 5.x and 6.x (2004684)

Further on in the hostd logs you will see some additional events that will correlate to storage connection.  Look for the below event id’s.

Event ID : esx.problem.storage.connectivity.lost

datestores3

“esx.problem.storage.connectivity.lost” event indicates a loss in connectivity to the specified storage device.  Any virtual machines using the affected datastore may become unresponsive.

Event ID : esx.problem.scsi.device.state.permanentloss

datastores4

“esx.problem.scsi.device.state.permanentloss” event indicates a permanent device loss.

Top 20 vRealize Operations Manager articles for December 2016

Top 20Here is our Top 20 vRealize Operations Manager articles list for December 2016. This list is ranked by the number of times a VMware Support Request was resolved by following the steps in a published Knowledge Base article.

  1. vRealize Operations Manager 6.1, 6.2, and 6.2.1 Sizing Guidelines
  2. Adding additional storage to vCenter Operations Manager Analytics or UI virtual machine
  3. After cancelling the selected alerts in VMware vRealize Operations Manager 6.0.x, one or more of the selected alerts remain
  4. Optimization of vRealize Operations Manager generated capacity planning metrics in 6.3
  5. vRealize Operations Manager 6.0.1, 6.0.2, and 6.0.3 Sizing Guidelines
  6. How to manually change the status of the node or the cluster in vRealize Operations Manager 6.x
  7. Cassandra service in VMware vRealize Operations Manager 6.1 and 6.2 fails with the error: Connection refused
  8. vSphere 6.x tags are not populated in vRealize Operations Manager 6.x
  9. After upgrading to vRealize Operations Manager 6.4 Objects show no Collection Status
  10. Performing a side-by-side upgrade of a VMware vRealize Operations 5.x Appliance to vRealize Operations Manager 6.0.x
  11. vRealize Operations Manager Sizing Guidelines
  12. Enabling SSH access in vRealize Operations Manager 6.x
  13. How to change a node hostname in vRealize Operations Manager vApp based cluster
  14. Manually removing a node from the VMware vRealize Operations Manager 6.x cluster
  15. Removing a solution from VMware vRealize Operations Manager 6.x
  16. Changing the IP address on a VMware vRealize Operations Manager single-node deployment
  17. An adapter instance of type vRealizeOpsMgrAPI is present on all VMware vRealize Operations Manager nodes
  18. Log in to the Admin or Product UI of VMware vRealize Operations Manager 6.x as admin fails with the error: Incorrect User name/Password
  19. VMware vRealize Operations Manager for Horizon 6.2 Broker Agent fails to pair with the Horizon adapter
  20. Node status reports as Waiting for Analytics in vRealize Operations Manager 6.x

Top 20 vRealize Automation articles for December 2016

Top 20Here is our Top 20 vRealize Automation (vRA) articles list for December 2016. This list is ranked by the number of times a VMware Support Request was resolved by following the steps in a published Knowledge Base article.

  1. How to remove a virtual machine from Management in vRA or vCAC
  2. vRA appliance runs out of disk space
  3. Removing a virtual machine from vRealize Automation 7.x using Cloud Client
  4. Various tasks fail while calling stored procedures with errors: “Error executing query usp_”
  5. Error ‘Unable to authenticate user. Please try again’ when logging to vRA 6.2.x VAMI page
  6. Troubleshooting 404 and 401 errors in vRA 6.x by validating component configuration
  7. Confirming and bench testing MSDTC communication for vRA
  8. vRA 7.x services fail when modifying the vIDM database
  9. Directory sync in vRA 7.x fails with error ‘Connector Communication failed with Response’
  10. Provisioning a machine in VMware vRealize Automation fails with the error: CloneVM : Sequence contains no matching element
  11. Renaming the VMware vSphere Cluster used in a VMware vRealize Automation environment
  12. Joining the VMWare vCenter Server Appliance or VMware vRealize Automation Identity Appliance to a domain fails with the error: Error trying to join AD, error code [31]
  13. Using the VMware vRealize Production Test Tool to validate and test your vRealize Automation configuration
  14. vRA 7.0 installation wizard prerequisite checker halts on error: “Waiting for host to trigger a restart”
  15. Error ‘BadCredentialsException’ when connecting to the embedded vRO instance on a clustered or HA distribution of vRA 7.0
  16. After the vRA update, Infrastructure tab fails with 401 Error
  17. Status of TLSv1.1/1.2 Enablement and TLSv1.0 Disablement across VMware products
  18. Logging in to embedded vRealize Orchestrator fails
  19. ASD day 2 action is executed on the wrong VM in vRA
  20. How to move the vRA SQL database to a new server

Inconsistencies between the IDP hostname and the vRA hostname producing behavior and authentication issues in vRealize Automation 7.x

Questions:

  • Have you been getting certificate trust issues when logging into vRealize Automation 7.x only to find that the certificate is trusted once you are logged in?
  • Have you ever noticed that your login page for vRA 7.x uses a different host name in the URL than the application?
  • Have you been having difficulty logging into the embedded vRealize Orchestrator instance?
  • Have you been trying to use vRealize CodeStream only to find unusual authentication issues?

It could be that you have a simple misconfiguration in your deployment that can be easily corrected. That is the subject of this blog post.

For many deployments, a fully qualified domain name (FQDN) will be selected to access your vRealize Automation (vRA) 7.x application that is different than the actual host name of the box. This alias could be created for a simple installation to make it easier for users to recall the URL or it could be used to access the VIP provided by a load balancer in a more distributed installation.

Depending upon the steps taken during the deployment (e.g., changing the vRA FQDN after the installation), the hostname in the IDP can be different than hostname configured for use with vRA. For example, the following image illustrates the hostname configured for accessing the vRA application in a hands-on lab:

vRealize Hostname

However, after appropriate tinkering, when users attempt to log in, they see the following:

Connection is Untrusted

In order to correct this, take appropriate backups then log in as a user with permission to administer the directory configuration (i.e., a tenant administrator). Select the Administration tab and navigate to Directories Management > Identity Providers. Select an IDP and you will be presented with an interface similar to the following:

ldP Hostname

Take note of the hostname and adjust this to be consistent with the vRA application FQDN found in the administration interface (VAMI) above. It is particularly important to rectify this if you happen to have the short name in the IDP, which can cause a number of problems. The changes take place immediately without restarting any services, though there may be additional steps needed to correct a particular issue. For example, I have had to re-register the vRO authentication after correcting this due to the state it was in after troubleshooting.

For additional reading, please review the following KB articleVMware KB Logging in to embedded vRealize Orchestrator fails (2146063)

Top 20 vCenter Server articles for December 2016

Top 20Here is our Top 20 vCenter articles list for December 2016. This list is ranked by the number of times a VMware Support Request was resolved by following the steps in a published Knowledge Base article.

  1. Build numbers and versions of VMware vCenter Server
  2. Uploading diagnostic information for VMware through the Secure FTP portal
  3. Downloading, licensing, and using VMware products
  4. Using the VMware Knowledge Base
  5. Support Contracts FAQs
  6. Collecting diagnostic information for VMware vCenter Server 4.x, 5.x and 6.x
  7. How to consolidate snapshots in vSphere 5.x/6.x
  8. Investigating virtual machine file locks on ESXi
  9. Troubleshooting an ESXi/ESX host in non responding state
  10. Resetting the VMware vCenter Server 5.x Inventory Service database
  11. Licensing VMware vCenter Site Recovery Manager
  12. vSphere handling of LUNs detected as snapshot LUNs
  13. Update sequence for vSphere 6.0 and its compatible VMware products
  14. How to repoint and re-register vCenter Server 5.1 / 5.5 and components
  15. vmware-dataservice-sca and vsphere-client status change from green to yellow
  16. How to register/add a VM to the Inventory in vCenter Server
  17. Upgrading to vCenter Server 6.0 Update 2a fails on VCSServiceManager with error code ‘1603’
  18. Unable to grow or expand a VMFS volume or datastore
  19. VMware End User License Agreements
  20. Back up and restore vCenter Server Appliance/vCenter Server 6.0 vPostgres database

Top 20 NSX articles for November 2016

Top 20Here is our Top 20 NSX articles list for December 2016. This list is ranked by the number of times a VMware Support Request was resolved by following the steps in a published Knowledge Base article.

  1. Important information before upgrading to vSphere 6.0
  2. “The pending transaction requires xxx MB free space” error when installing VIBs
  3. Downloading and enabling VMware vShield Endpoint 5.x on supported VMware vSphere platforms
  4. vCenter Server or Platform Services Controller certificate validation error for external VMware Solutions in vSphere 6.0
  5. Registering NSX Manager to Lookup Service with External Platform Service Controller (PSC) fails with the error: server certificate chain not verified
  6. “Failed to restore PF state : Limit exceeded” error after NSX upgrade
  7. vsfwd connection to the NSX Manager fails
  8. vShield/NSX Manager Web interface does not update in VMware vCloud Networking and Security 5.x and NSX for vSphere 6.x
  9. Troubleshooting NSX for vSphere 6.x Distributed Firewall (DFW)
  10. NSX Controller disconnected or isolates intermittently
  11. Licensing VMware vSphere 5.5.x/6.0.x and VMware NSX for vSphere 6.x
  12. Duplicate VTEPs in ESXi hosts after rebooting vCenter Server
  13. Status of TLSv1.1/1.2 Enablement and TLSv1.0 Disablement across VMware products
  14. /var/log partition is full in NSX Edge loadbalancer
  15. Creating a logical switch fails in NSX
  16. Response to OpenSSL security issue CVE-2014-0160/CVE-2014-0346 a.k.a: “Heartbleed”
  17. Securing VMware NSX for vSphere 6.0 CLI User Accounts and Privileged mode
  18. Network Port requirements for VMware NSX for vSphere 6.x
  19. A virtual machine loses network connectivity during migration between resource pools, clusters or vApps in vCloud Networking and Security 5.1.4, 5.5.2 and NSX for vSphere 6.0.4
  20. VMware Virtual Appliances and customizations to operating system and included packages

Top 20 ESXi articles for December 2016

Top 20Here is our Top 20 ESXi articles list for December 2016. This list is ranked by the number of times a VMware Support Request was resolved by following the steps in a published Knowledge Base article.

  1. Product offerings for vSphere 6.x
  2. Uploading diagnostic information for VMware using FTP
  3. How to purchase and file Pay Per Incident support for VMware products
  4. Downloading and licensing vSphere Hypervisor (ESXi 6.x)
  5. Uploading diagnostic information for VMware through the Secure FTP portal
  6. Downloading, licensing, and using VMware products
  7. Using the VMware Knowledge Base
  8. Support Contracts FAQs
  9. ESXi host fails with PSOD when using Intel Xeon Processor E5 v4, E7 v4, and D-1500 families
  10. Snapshot removal task stops at 99% in ESXi/ESX
  11. How to consolidate snapshots in vSphere 5.x/6.x
  12. Unable to power off the virtual machine in an ESXi host
  13. “esxcli software vib” commands to patch an ESXi 5.x/6.x host
  14. Creating a persistent scratch location for ESXi 4.x/5.x/6.x
  15. Determining Network/Storage firmware and driver version in ESXi 4.x and later
  16. Unable to delete the virtual machine snapshots
  17. Investigating virtual machine file locks on ESXi
  18. How to unmount a LUN or detach a datastore device from ESXi hosts
  19. ESXi 5.x with E1000e adapter fails with purple diagnostic screen
  20. Recreating a missing virtual machine disk descriptor file

New KB articles published for week ending 01/03/17

VMware Horizon
Cannot reconnect to allocated View Desktop when using USB redirection (2148409)
Date Published: 12/28/2016

VMware Horizon Air Hybrid-Mode
When you use the maximum number of DHCP addresses available, the system improperly reports capacity (2146068)
Date Published: 12/31/2016

VMware Integrated OpenStack
VMware Integrated OpenStack new instance fail to get DHCP when using a new NSX Edge (2148198)
Date Published: 12/29/2016
Configuring and troubleshooting Load Balancing as a Service in Integrated Openstack 3.0 (2147657)
Date Published: 12/28/2016

VMware vCenter Server
cloud.uuid remains in the cloned vmx file after cloning the vCloud Director system (2146488)
Date Published: 12/30/2016
Error “503 Service Unavailable” when vCenter Server installed on Windows Server 2016 (2148380)
Date Published: 12/26/2016

VMware vCenter Site Recovery Manager
SRM service fails with the error: The multi-part identifier “TMP.mo_id” could not be bound (2148245)
Date Published: 12/29/2016

VMware vCloud Director for Service Provider
Logging in to vCloud Director using LDAP with Kerberos fails (2147599)
Date Published: 12/28/2016

VMware vRealize Business for Cloud Standard
vRealize Business cost calculation is in red due to a snapshot creation failure (2145614)
Date Published: 12/30/2016

VMware vRealize Code Stream
vRealize Code Stream 2.2 Hot Fix 1 (2148339)
Date Published: 1/1/2017

VMware vRealize Configuration Manager
Error “An error occurred on the server when processing the URL. Please contact the system administrator” or “500 Internal Server Error” when adding or editing a scheduled job in vRealize Configuration Manager (2145430)
Date Published: 12/29/2016

VMware vRealize Log Insight
vCenter Server events tasks and alarms does not work in vRealize Log Insight 3.6 (2147868)
Date Published: 12/26/2016

VMware vRealize Orchestrator
How to enable remote debugging for workflow execution with vRO/ vRO smart client (2148402)
Date Published: 12/29/2016

VMware vSphere Replication
vSphere Replication takes 24 hours to start after removing a large number of hosts from the environment (2146504)
Date Published: 12/30/2016