Home > Blogs > Support Insider > Category Archives: Datacenter

Category Archives: Datacenter

vSphere 6.5 is here! What you need to know

vSphere 6.5 has been released for all to download. We’re sure you vSphere users are all eager to install a copy and start kicking the tires, and we’re just as eager to see that you get started on the right foot. With this in mind, we have created the following list of Knowledge Base articles that are brand new, or have been updated for vSphere 6.5

First of all: Download VMware vSphere and Get Your vSphere License Key

KB articles recommended by VMware Support before you start your journey:

 

For more details on the release please refer to the vSphere 6.5 announcement.

If you are interested in learning more about vSphere 6.5, there are several options:

Top 20 vCenter Server articles for August 2016

Top 20Here is our Top 20 vCenter articles list for August 2016. This list is ranked by the number of times a VMware Support Request was resolved by following the steps in a published Knowledge Base article.

  1. Investigating virtual machine file locks on ESXi/ESX
  2. Using the VMware Knowledge Base
  3. Uploading diagnostic information for VMware through the Secure FTP portal
  4. Correlating build numbers and versions of VMware products
  5. Licensing VMware vCenter Site Recovery Manager
  6. Permanent Device Loss (PDL) and All-Paths-Down (APD) in vSphere 5.x and 6.x
  7. Resetting the VMware vCenter Server 5.x Inventory Service database
  8. Downloading, licensing, and using VMware products
  9. Build numbers and versions of VMware vCenter Server
  10. How to repoint and re-register vCenter Server 5.1 / 5.5 and components
  11. vSphere handling of LUNs detected as snapshot LUNs
  12. Upgrading to vCenter Server 6.0 best practices
  13. How to consolidate snapshots in vSphere 5.x/6.0
  14. ESXi 5.5 Update 3b and later hosts are not manageable after an upgrade
  15. Collecting diagnostic information for VMware vCenter Server 4.x, 5.x and 6.0
  16. How to enable EVC in vCenter Server
  17. Upgrading to vCenter Server 5.5 best practices
  18. VMware End User License Agreements
  19. “Failed to verify the SSL certificate for one or more vCenter Server Systems” error in the vSphere Web Client
  20. VMware vCenter Server 5.x fails to start with the error: Failed to add LDAP entry

Path failover may not be successful when using Cisco MDS Switches on NX-OS 7.3 and FCoE based HBAs

So I wanted to get this blog post out sooner rather than later as it might effect a significant number of customers. In a nutshell, if you perform array maintenance that requires you to reboot a storage controller, the probability of successful path failover is low. This is effectively due to stale entries in the fiber channel name server on Cisco MDS switches running NX-OS 7.3, which is a rather new code release. As the title suggests, this only affects FCoE HBAs, specifically ones that rely on our libfc/libfcoe stack for FCoE connectivity. Such HBAs would be Cisco fnic HBAs as well as a handful of Emulex FCoE HBAs and a couple others.

Here is an example of a successful path failover after receiving an RSCN (Register State Change Notification) from the array controller after performing a reboot:

2016-07-07T17:36:34.230Z cpu17:33461)<6>host4: disc: Received an RSCN event
 2016-07-07T17:36:34.230Z cpu17:33461)<6>host4: disc: Port address format for port (e50800)
 2016-07-07T17:36:34.230Z cpu17:33461)<6>host4: disc: RSCN received: not rediscovering. redisc 0 state 9 in_prog 0
 2016-07-07T17:36:34.231Z cpu14:33474)<6>host4: disc: GPN_ID rejected reason 9 exp 1
 2016-07-07T17:36:34.231Z cpu14:33474)<6>host4: rport e50800: Remove port
 2016-07-07T17:36:34.231Z cpu14:33474)<6>host4: rport e50800: Port entered LOGO state from Ready state
 2016-07-07T17:36:34.231Z cpu14:33474)<6>host4: rport e50800: Delete port
 2016-07-07T17:36:34.231Z cpu54:33448)<6>host4: rport e50800: work event 3
 2016-07-07T17:36:34.231Z cpu54:33448)<7>fnic : 4 :: fnic_rport_exch_reset called portid 0xe50800
 2016-07-07T17:36:34.231Z cpu54:33448)<7>fnic : 4 :: fnic_rport_reset_exch: Issuing abts
 2016-07-07T17:36:34.231Z cpu54:33448)<6>host4: rport e50800: Received a LOGO response closed
 2016-07-07T17:36:34.231Z cpu54:33448)<6>host4: rport e50800: Received a LOGO response, but in state Delete
 2016-07-07T17:36:34.231Z cpu54:33448)<6>host4: rport e50800: work delete

Here is a breakdown of what you just read:

  1. RSCN is received from the array controller
  2. Operation is now is state = 9
  3. GPN_ID (Get Port Name ID) is issued to the switches but is rejected because the state is 9 (See http://lists.open-fcoe.org/pipermail/fcoe-devel/2009-June/002828.html)
  4. LibFC begins to remove the port information on the host
  5. Port enters LOGO (Logout) state from previous state, which was Ready
  6. LibFC Deletes the port information

After this the ESX host will failover to other available ports, which would be on the peer SP:

2016-07-07T17:36:44.233Z cpu33:33459)<3> rport-4:0-1: blocked FC remote port time out: saving binding
 2016-07-07T17:36:44.233Z cpu55:33473)<7>fnic : 4 :: fnic_terminate_rport_io called wwpn 0x524a937aeb740513, wwnn0xffffffffffffffff, rport 0x0x4309b72f3c50, portid 0xffffffff
 2016-07-07T17:36:44.257Z cpu52:33320)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x2a (0x43a659d15bc0, 36277) to dev "naa.624a93704d1296f5972642ea0001101c" on path "vmhba3:C0:T0:L1" Failed: H:0x1 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:FAILOVER

A Host status of H:0x1 means NO_CONNECT, hence the failover.

Now here is an example of the same operation on a Cisco MDS switch running NX-OS 7.3 when a storage controller on the array is rebooted:

2016-07-14T19:02:03.551Z cpu47:33448)<6>host2: disc: Received an RSCN event
 2016-07-14T19:02:03.551Z cpu47:33448)<6>host2: disc: Port address format for port (e50900)
 2016-07-14T19:02:03.551Z cpu47:33448)<6>host2: disc: RSCN received: not rediscovering. redisc 0 state 9 in_prog 0
 2016-07-14T19:02:03.557Z cpu47:33444)<6>host2: rport e50900: ADISC port
 2016-07-14T19:02:03.557Z cpu47:33444)<6>host2: rport e50900: sending ADISC from Ready state
 2016-07-14T19:02:23.558Z cpu47:33448)<6>host2: rport e50900: Received a ADISC response
 2016-07-14T19:02:23.558Z cpu47:33448)<6>host2: rport e50900: Error 1 in state ADISC, retries 0
 2016-07-14T19:02:23.558Z cpu47:33448)<6>host2: rport e50900: Port entered LOGO state from ADISC state
 2016-07-14T19:02:43.560Z cpu2:33442)<6>host2: rport e50900: Received a LOGO response timeout
 2016-07-14T19:02:43.560Z cpu2:33442)<6>host2: rport e50900: Error -1 in state LOGO, retrying
 2016-07-14T19:02:43.560Z cpu58:33446)<6>host2: rport e50900: Port entered LOGO state from LOGO state
 2016-07-14T19:03:03.563Z cpu54:33449)<6>host2: rport e50900: Received a LOGO response timeout
 2016-07-14T19:03:03.563Z cpu54:33449)<6>host2: rport e50900: Error -1 in state LOGO, retrying
 2016-07-14T19:03:03.563Z cpu2:33442)<6>host2: rport e50900: Port entered LOGO state from LOGO state
 2016-07-14T19:03:23.565Z cpu32:33447)<6>host2: rport e50900: Received a LOGO response timeout
 2016-07-14T19:03:23.565Z cpu32:33447)<6>host2: rport e50900: Error -1 in state LOGO, retrying
 2016-07-14T19:03:23.565Z cpu54:33449)<6>host2: rport e50900: Port entered LOGO state from LOGO state
 2016-07-14T19:03:43.567Z cpu50:33445)<6>host2: rport e50900: Received a LOGO response timeout
 2016-07-14T19:03:43.567Z cpu50:33445)<6>host2: rport e50900: Error -1 in state LOGO, retrying
 2016-07-14T19:03:43.567Z cpu32:33447)<6>host2: rport e50900: Port entered LOGO state from LOGO state
 2016-07-14T19:04:03.568Z cpu54:33443)<6>host2: rport e50900: Received a LOGO response timeout
 2016-07-14T19:04:03.568Z cpu54:33443)<6>host2: rport e50900: Error -1 in state LOGO, retrying
 2016-07-14T19:04:03.569Z cpu32:33472)<6>host2: rport e50900: Port entered LOGO state from LOGO state
 2016-07-14T19:04:43.573Z cpu20:33473)<6>host2: rport e50900: Received a LOGO response timeout
 2016-07-14T19:04:43.573Z cpu20:33473)<6>host2: rport e50900: Error -1 in state LOGO, retrying
 2016-07-14T19:04:43.573Z cpu54:33443)<6>host2: rport e50900: Port entered LOGO state from LOGO state

Notice the difference? Here is a breakdown of what happened this time:

  1. RSCN is received from the array controller
  2. Operation is now is state = 9
  3. GPN_ID (Get Port Name ID) is issued to the switches but is NOT rejected
  4. Since GPN_ID is valid, LibFC issues an Address Discovery (ADISC)
  5. 20 seconds later the ADISC sent times out and this continues to occur every 20 seconds

The problem is that the ADISC will continue this behavior until the array controller completes the reboot and is back online:

2016-07-14T19:04:47.276Z cpu56:33451)<6>host2: disc: Received an RSCN event
 2016-07-14T19:04:47.276Z cpu56:33451)<6>host2: disc: Port address format for port (e50900)
 2016-07-14T19:04:47.276Z cpu56:33451)<6>host2: disc: RSCN received: not rediscovering. redisc 0 state 9 in_prog 0
 2016-07-14T19:04:47.277Z cpu20:33454)<6>host2: rport e50900: Login to port
 2016-07-14T19:04:47.277Z cpu20:33454)<6>host2: rport e50900: Port entered PLOGI state from LOGO state
 2016-07-14T19:04:47.278Z cpu57:33456)<6>host2: rport e50900: Received a PLOGI accept
 2016-07-14T19:04:47.278Z cpu57:33456)<6>host2: rport e50900: Port entered PRLI state from PLOGI state
 2016-07-14T19:04:47.278Z cpu52:33458)<6>host2: rport e50900: Received a PRLI accept
 2016-07-14T19:04:47.278Z cpu52:33458)<6>host2: rport e50900: PRLI spp_flags = 0x21
 2016-07-14T19:04:47.278Z cpu52:33458)<6>host2: rport e50900: Port entered RTV state from PRLI state
 2016-07-14T19:04:47.278Z cpu57:33452)<6>host2: rport e50900: Received a RTV reject
 2016-07-14T19:04:47.278Z cpu57:33452)<6>host2: rport e50900: Port is Ready

What is actually happening here is that the Cisco MDS switches are quick to receive the RSCN from the array controller and pass it along to the host HBAs however due to a timing issue the entries for that array controller in the FCNS (Fiber Channel Name Server) database are still present when the host HBAs issue the GPN_ID so the switches respond to that request instead of rejecting it. If you review the entry in http://lists.open-fcoe.org/pipermail/fcoe-devel/2009-June/002828.html you see that code was added to validate that the target is actually off the fabric instead of assuming it would be by the RSCN alone. There are various reasons to do this but suffice it to say that it is better to be safe than sorry in this instance.

Unfortunately there is no fix for this at this time, which is why this is potentially so impactful to our customers since it means they effectively are unable to perform array maintenance without the risk of VMs crashing or even corruption. Cisco is fixing this in 7.3(1), which due out in a few weeks.

Here are a couple of references regarding this issue:

 

Cheers,
Nathan Small
Technical Director
Global Support Services
VMware

Top 20 vCenter Server articles for July 2016

Top 20Here is our Top 20 vCenter articles list for July 2016. This list is ranked by the number of times a VMware Support Request was resolved by following the steps in a published Knowledge Base article.

  1. Uploading diagnostic information for VMware using FTP
  2. Downloading, licensing, and using VMware products
  3. Licensing VMware vCenter Site Recovery Manager
  4. Collecting diagnostic information for VMware vCenter Server 4.x, 5.x and 6.0
  5. Using the VMware Knowledge Base
  6. Best practices for upgrading to vCenter Server 6.0
  7. ESXi hosts are no longer manageable after an upgrade
  8. Permanent Device Loss (PDL) and All-Paths-Down (APD) in vSphere 5.x and 6.x
  9. Consolidating snapshots in vSphere 5.x/6.0
  10. Diagnosing an ESXi/ESX host that is disconnected or not responding in VMware vCenter Server
  11. How to unlock and reset the vCenter SSO administrator password
  12. Resetting the VMware vCenter Server 5.x Inventory Service database
  13. Correlating build numbers and versions of VMware products
  14. Back up and restore vCenter Server Appliance/vCenter Server 6.0 vPostgres database
  15. Build numbers and versions of VMware vCenter Server
  16. Re-pointing and re-registering VMware vCenter Server 5.1 / 5.5 and components
  17. “Deprecated VMFS volume(s) found on the host” error in ESXi hosts
  18. vmware-dataservice-sca and vsphere-client status change from green to yellow
  19. Investigating virtual machine file locks on ESXi/ESX
  20. VMware End User License Agreements

Top 20 vCenter Server articles for June 2016

Top 20Here is our Top 20 vCenter articles list for June 2016. This list is ranked by the number of times a VMware Support Request was resolved by following the steps in a published Knowledge Base article.

  1. Purging old data from the database used by VMware vCenter Server
  2. ESXi 5.5 Update 3b and later hosts are no longer manageable after upgrade
  3. Resetting the VMware vCenter Server and vCenter Server Appliance 6.0 Inventory Service database
  4. Unlocking and resetting the VMware vCenter Single Sign-On administrator password
  5. Permanent Device Loss (PDL) and All-Paths-Down (APD) in vSphere 5.x and 6.x
  6. Upgrading to vCenter Server 6.0 best practices
  7. Correlating build numbers and versions of VMware products
  8. Update sequence for vSphere 6.0 and its compatible VMware products
  9. Stopping, starting, or restarting VMware vCenter Server services
  10. In vCenter Server 6.0, the vmware-dataservice-sca and vsphere-client status change from green to yellow continually
  11. Enabling EVC on a cluster when vCenter Server is running in a virtual machine
  12. The vpxd process becomes unresponsive after upgrading to VMware vCenter Server 5.5
  13. Migrating the vCenter Server database from SQL Express to full SQL Server
  14. Reducing the size of the vCenter Server database when the rollup scripts take a long time to run
  15. Consolidating snapshots in vSphere 5.x/6.0
  16. Back up and restore vCenter Server Appliance/vCenter Server 6.0 vPostgres database
  17. Diagnosing an ESXi/ESX host that is disconnected or not responding in VMware vCenter Server
  18. Build numbers and versions of VMware vCenter Server
  19. Increasing the size of a virtual disk
  20. Determining where growth is occurring in the VMware vCenter Server database

Windows 2008+ incremental backups become full backups in ESXi 6.0 b3825889

vmware_tools_iconVMware is actively working to address a recently discovered issue wherein an incremental backup becomes a full backup when backing up Windows 2008 (or higher) virtual machines with VSS-based application quiesced snapshot.

This recent CBT (Changed Block Tracking) issue does not cause any data loss or data corruption.

This issue is well understood and VMware engineering is actively working on a fix.

For more details on this issue and latest status on resolution, please refer to KB article: After upgrading to ESXi 6.0 Build 3825889, incremental virtual machine backups effectively run as full backups when application consistent quiescing is enabled (2145895)

Subscribe to the rss feed for the KB article to ensure you do not miss any update by using this link.

Top 20 vCenter articles for May 2016

Top 20Here is our Top 20 vCenter articles list for May 2016. This list is ranked by the number of times a VMware Support Request was resolved by following the steps in a published Knowledge Base article.

  1. Purging old data from the database used by VMware vCenter Server
  2. ESXi 5.5 Update 3b and later hosts are no longer manageable after upgrade
  3. Permanent Device Loss (PDL) and All-Paths-Down (APD) in vSphere 5.x and 6.x
  4. Upgrading to vCenter Server 6.0 best practices
  5. ESX/ESXi host keeps disconnecting and reconnecting when heartbeats are not received by vCenter Server
  6. Unlocking and resetting the VMware vCenter Single Sign-On administrator password
  7. Consolidating snapshots in vSphere 5.x/6.0
  8. Powering on a virtual machine fails after a storage outage with the error: could not open/create change tracking file
  9. Diagnosing an ESXi/ESX host that is disconnected or not responding in VMware vCenter Server
  10. VMware vSphere Web Client displays the error: Failed to verify the SSL certificate for one or more vCenter Server Systems
  11. Deprecated VMFS volume warning reported by ESXi hosts
  12. Resetting the VMware vCenter Server and vCenter Server Appliance 6.0 Inventory Service database
  13. Cannot take a quiesced snapshot of Windows 2008 R2 virtual machine
  14. vCenter Server 5.5 fails to start after reboot with the error: Unable to create SSO facade: Invalid response code: 404 Not Found
  15. Update sequence for vSphere 6.0 and its compatible VMware products
  16. Registering or adding a virtual machine to the Inventory in vCenter Server or in an ESX/ESXi host
  17. Back up and restore vCenter Server Appliance/vCenter Server 6.0 vPostgres database
  18. Updating rollup jobs after the error: Performance data is currently not available for this entity
  19. Configuring VMware vCenter Server to send alarms when virtual machines are running from snapshots
  20. Determining where growth is occurring in the VMware vCenter Server database

Top 20 ESXi articles for May 2016

Top 20Here is our Top 20 ESXi articles list for May 2016. This list is ranked by the number of times a VMware Support Request was resolved by following the steps in a published Knowledge Base article.

  1. VMware ESXi 5.x host experiences a purple diagnostic screen mentioning E1000PollRxRing and E1000DevRx
  2. ESXi 5.5 Update 3b and later hosts are no longer manageable after upgrade
  3. Commands to monitor snapshot deletion in VMware ESXi/ESX
  4. Recreating a missing virtual machine disk descriptor file
  5. Determining Network/Storage firmware and driver version in ESXi/ESX 4.x, ESXi 5.x, and ESXi 6.x
  6. Permanent Device Loss (PDL) and All-Paths-Down (APD) in vSphere 5.x and 6.x
  7. Installing patches on an ESXi 5.x/6.x host from the command line
  8. Identifying and addressing Non-Maskable Interrupt events on an ESX/ESXi host
  9. Restarting the Management agents on an ESXi or ESX host
  10. Downloading and installing async drivers in VMware ESXi 5.x and ESXi 6.0.x
  11. Enabling or disabling VAAI ATS heartbeat
  12. ESXi 5.5 or 6.0 host disconnects from vCenter Server with the syslog.log error: Unable to allocate memory
  13. Powering off a virtual machine on an ESXi host
  14. Consolidating snapshots in vSphere 5.x/6.0
  15. Powering on a virtual machine fails after a storage outage with the error: could not open/create change tracking file
  16. Snapshot consolidation in VMware ESXi 5.5.x and ESXi 6.0.x fails with the error: maximum consolidate retries was exceeded for scsix:x
  17. Reverting to a previous version of ESXi
  18. Configuring a diagnostic coredump partition on an ESXi 5.x/6.0 host
  19. Diagnosing an ESXi/ESX host that is disconnected or not responding in VMware vCenter Server
  20. Enabling or disabling simultaneous write protection provided by VMFS using the multi-writer flag

Inventory objects fail to display in the vSphere Web Client

An issue recently causing a few calls into our support lines happens after installing or upgrading to vCenter Server 6.0. The issue also impacts the vCenter appliance and is not specific to Windows. The issue occurs in the vSphere Web Client due to caching within the VMware Inventory Service.

  • When browsing inventory under Host and Clusters, some or no object are displayed
  • When using the Search function in the vSphere Web Client the object is found
  • The Issue does not occur in the vSphere Client
  • When browsing inventory under Related Objects at multiple levels you see this message:
no object found

If you suspect you have run into this head on over to KB: In the vSphere Web Client 6.0 inventory objects fail to display (2144934) for more details on what to check in your logs and what can be done about it.

 

Inventory objects

Top 20 vCenter articles for April 2016

Top 20Here is our Top 20 vCenter articles list for April 2016. This list is ranked by the number of times a VMware Support Request was resolved by following the steps in a published Knowledge Base article.

  1. Purging old data from the database used by VMware vCenter Server
  2. After upgrading an ESXi host to 5.5 Update 3b and later, the host is no longer manageable by vCenter Server
  3. Permanent Device Loss (PDL) and All-Paths-Down (APD) in vSphere 5.x and 6.x
  4. Unlocking and resetting the VMware vCenter Single Sign-On administrator password
  5. Consolidating snapshots in vSphere 5.x/6.0
  6. Deprecated VMFS volume warning reported by ESXi hosts
  7. Upgrading to vCenter Server 6.0 best practices
  8. Resetting the VMware vCenter Server and vCenter Server Appliance 6.0 Inventory Service database
  9. Linked Clone pool creation and recomposition fails with VMware Horizon View 6.1.x and older releases
  10. Replacing default certificates with CA signed SSL certificates in vSphere 6.0
  11. Update sequence for VMware vSphere 5.5 and its compatible VMware products
  12. ESX/ESXi host keeps disconnecting and reconnecting when heartbeats are not received by vCenter Server
  13. VMware vSphere Web Client displays the error: Failed to verify the SSL certificate for one or more vCenter Server Systems
  14. Updating vCenter Server 5.5 to Update 3 fails with the error: Warning 32014. A utility for phone home data collector couldn’t be executed successfully
  15. Reducing the size of the vCenter Server database when the rollup scripts take a long time to run
  16. vCenter Server fails immediately after starting with the error: Fault Module: ntdll.dll
  17. Replacing a vSphere 6.0 Machine SSL certificate with a Custom Certificate Authority Signed Certificate
  18. In the vSphere Web Client 6.0 inventory objects fail to display
  19. Migrating virtual machines with Raw Device Mappings (RDMs)
  20. Cannot take a quiesced snapshot of Windows 2008 R2 virtual machine