Home > Blogs > Support Insider > Tag Archives: Troubleshooting

Tag Archives: Troubleshooting

Important KB updates for current NSX for vSphere users -May 2016 Edition

NSXOur NSX support team would like all of our customers to know about important KB updates for current NSX for vSphere issues. Here’s what’s new and trending-

Please take note of key updates to the following important End of General Support and End of Availability events:

New and important issues:

NSX for Multi-Hypervisor:

New master playbook KBs:

How to track the top field issues:

 

Host disconnected from vCenter and VMs showing as inaccessible

Another deep-dive troubleshooting blog today from Nathan Small (twitter account: @vSphereStorage)
 
Description from customer:
 
Host is getting disconnected from vCenter and VMs are showing as inaccessible. Only one host is affected.
 
 
Analysis:
 
A quick review of the vmkernel log shows a log spew of H:0x7 errors to numerous LUNs. Here is a short snippet where you can see how frequently they are occurring (multiple times per second):
 
# cat /var/log/vmkernel.log
 
2016-01-13T18:54:42.994Z cpu68:8260)ScsiDeviceIO: 2326: Cmd(0x412540b96e80) 0x28, CmdSN 0x8000006b from world 11725 to dev “naa.600601601b703400a4f90c3d0668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:43.027Z cpu68:8260)ScsiDeviceIO: 2326: Cmd(0x4125401b2580) 0x28, CmdSN 0x8000002e from world 11725 to dev “naa.600601601b70340064a24ada10fae211” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:43.030Z cpu68:8260)ScsiDeviceIO: 2326: Cmd(0x4125406d5380) 0x28, CmdSN 0x80000016 from world 11725 to dev “naa.600601601b7034000c70e4e610fae211” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:43.542Z cpu67:8259)ScsiDeviceIO: 2326: Cmd(0x412540748800) 0x28, CmdSN 0x80000045 from world 11725 to dev “naa.600601601b70340064a24ada10fae211” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:43.808Z cpu74:8266)ScsiDeviceIO: 2326: Cmd(0x412541229040) 0x28, CmdSN 0x8000003c from world 11725 to dev “naa.600601601b7034008e56670a11fae211” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:44.088Z cpu38:8230)ScsiDeviceIO: 2326: Cmd(0x4124c0ff4f80) 0x28, CmdSN 0x80000030 from world 11701 to dev “naa.600601601b703400220f77ab15fae211” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:44.180Z cpu74:8266)ScsiDeviceIO: 2326: Cmd(0x412540ccda80) 0x28, CmdSN 0x80000047 from world 11725 to dev “naa.600601601b70340042b582440668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:44.741Z cpu61:8253)ScsiDeviceIO: 2326: Cmd(0x412540b94480) 0x28, CmdSN 0x80000051 from world 11725 to dev “naa.600601601b70340060918f5b0668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:44.897Z cpu63:8255)ScsiDeviceIO: 2326: Cmd(0x412540ff3180) 0x28, CmdSN 0x8000007a from world 11725 to dev “naa.600601601b7034005c918f5b0668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:45.355Z cpu78:8270)ScsiDeviceIO: 2326: Cmd(0x412540f3b2c0) 0x28, CmdSN 0x80000039 from world 11725 to dev “naa.600601601b70340060918f5b0668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:45.522Z cpu70:8262)ScsiDeviceIO: 2326: Cmd(0x41254073d0c0) 0x28, CmdSN 0x8000002c from world 11725 to dev “naa.600601601b7034000e3e97350668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:45.584Z cpu71:8263)ScsiDeviceIO: 2326: Cmd(0x412541021780) 0x28, CmdSN 0x80000067 from world 11725 to dev “naa.600601601b7034000e3e97350668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:45.803Z cpu63:8255)ScsiDeviceIO: 2326: Cmd(0x412540d20480) 0x28, CmdSN 0x80000019 from world 11725 to dev “naa.600601601b703400d24fc7620668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:46.253Z cpu74:8266)ScsiDeviceIO: 2326: Cmd(0x412540b96380) 0x28, CmdSN 0x8000006f from world 11725 to dev “naa.600601601b7034005e918f5b0668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
 
The Host side error (H:0x7) literally translates to Storage Initiator Error, which makes it sounds like there is something physical wrong with the card. One needs to understand that this status is sent up the stack from the HBA driver so really it is up to the those that write the driver to use this status for certain conditions. As there are no accompanying errors from the HBA driver, which in this case is a Brocade HBA, this is all we have to work with without enabling verbose logging in the driver. Verbose logging requires a reboot so this is not always an option when investigating root cause. The exception would be that the issue in ongoing so rebooting a host to capture this data is a viable option.
 
Taking a LUN as an example from ‘esxcfg-mpath -b’ output to get a view of the paths and targets:
 
# esxcfg-mpath -b
 
naa.600601601b703400b6aa124c0668e311 : DGC Fibre Channel Disk (naa.600601601b703400b6aa124c0668e311)
   vmhba0:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9a WWPN: 20:01:74:86:7a:ae:1c:9a  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:63:47:20:7a:a8
   vmhba1:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9c WWPN: 20:01:74:86:7a:ae:1c:9c  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:60:47:24:7a:a8
   vmhba0:C0:T1:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9a WWPN: 20:01:74:86:7a:ae:1c:9a  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:6b:47:20:7a:a8
   vmhba1:C0:T2:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9c WWPN: 20:01:74:86:7a:ae:1c:9c  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:68:47:24:7a:a8
   vmhba2:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:32 WWPN: 20:01:74:86:7a:ae:1c:32  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:63:47:20:7a:a8
   vmhba3:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:34 WWPN: 20:01:74:86:7a:ae:1c:34  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:60:47:24:7a:a8
   vmhba2:C0:T1:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:32 WWPN: 20:01:74:86:7a:ae:1c:32  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:6b:47:20:7a:a8
   vmhba3:C0:T2:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:34 WWPN: 20:01:74:86:7a:ae:1c:34  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:68:47:24:7a:a8
 
Let’s look at the adapter statistics for all HBAs. I would recommend always using localcli over esxcli when troubleshoot as esxcli requires hostd to be functioning properly:
 
# localcli storage core adapter stats get
 
vmhba0:
   Successful Commands: 844542177
   Blocks Read: 243114868277
   Blocks Written: 25821448417
  Read Operations: 395494703
   Write Operations: 405753901
   Reserve Operations: 0
   Reservation Conflicts: 0
   Failed Commands: 35403
   Failed Blocks Read: 57744
   Failed Blocks Written: 16843
   Failed Read Operations: 8224
   Failed Write Operations: 16450
   Failed Reserve Operations: 0
   Total Splits: 0
   PAE Commands: 0
 
vmhba1:
   Successful Commands: 502595840 <– Far less successful commands than the other adapters
   Blocks Read: 116436597821
   Blocks Written: 16509939615
   Read Operations: 216572537
   Write Operations: 245276523
   Reserve Operations: 0
   Reservation Conflicts: 0
   Failed Commands: 10942696
   Failed Blocks Read: 12055379188 <– 12 billion failed blocks read! Other adapters are all less than 60,000
   Failed Blocks Written: 933809
   Failed Read Operations: 10895926
   Failed Write Operations: 25645
   Failed Reserve Operations: 0
   Total Splits: 0
   PAE Commands: 0
 
vmhba2:
   Successful Commands: 845976973
   Blocks Read: 244034940187
   Blocks Written: 26063852941
   Read Operations: 397564994
   Write Operations: 407538414
   Reserve Operations: 0
   Reservation Conflicts: 0
   Failed Commands: 40468
   Failed Blocks Read: 44157
   Failed Blocks Written: 18676
   Failed Read Operations: 5506
   Failed Write Operations: 12152
   Failed Reserve Operations: 0
   Total Splits: 0
   PAE Commands: 0
 
vmhba3:
   Successful Commands: 866718515
   Blocks Read: 249837164491
   Blocks Written: 26492209531
   Read Operations: 406367844
   Write Operations: 416901703
   Reserve Operations: 0
   Reservation Conflicts: 0
   Failed Commands: 37723
   Failed Blocks Read: 23191
   Failed Blocks Written: 139380
   Failed Read Operations: 7372
   Failed Write Operations: 14878
   Failed Reserve Operations: 0
   Total Splits: 0
   PAE Commands: 0
 
 
Let’s see how often the vmkernel.log reports messages for that HBA:
 
# cat vmkernel.log |grep vmhba0|wc -l
112
 
# cat vmkernel.log |grep vmhba1|wc -l
8474 <– over 8000 times this HBA is mentioned! This doesn’t mean they are all errors, of course, but based on the log spew we know is already occurring it means it likely is
 
# cat vmkernel.log |grep vmhba2|wc -l
222
 
# cat vmkernel.log |grep vmhba3|wc -l
335
 
Now let’s take a look at the zoning to see if multiple adapters are zoned to the exact same array targets (WWPN) in attempt to determine if the issue is possibly array side or HBA side:
 
# esxcfg-mpath -b
 
naa.600601601b703400b6aa124c0668e311 : DGC Fibre Channel Disk (naa.600601601b703400b6aa124c0668e311)
   vmhba0:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9a WWPN: 20:01:74:86:7a:ae:1c:9a  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:63:47:20:7a:a8
   vmhba1:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9c WWPN: 20:01:74:86:7a:ae:1c:9c  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:60:47:24:7a:a8
   vmhba0:C0:T1:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9a WWPN: 20:01:74:86:7a:ae:1c:9a  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:6b:47:20:7a:a8
   vmhba1:C0:T2:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9c WWPN: 20:01:74:86:7a:ae:1c:9c  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:68:47:24:7a:a8
   vmhba2:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:32 WWPN: 20:01:74:86:7a:ae:1c:32  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:63:47:20:7a:a8
   vmhba3:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:34 WWPN: 20:01:74:86:7a:ae:1c:34  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:60:47:24:7a:a8
   vmhba2:C0:T1:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:32 WWPN: 20:01:74:86:7a:ae:1c:32  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:6b:47:20:7a:a8
   vmhba3:C0:T2:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:34 WWPN: 20:01:74:86:7a:ae:1c:34  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:68:47:24:7a:a8
 
Let’s isolate the HBAs so they are easier to visually compare the WWPN of the array targets:
 
vmhba1:
 
   vmhba1:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9c WWPN: 20:01:74:86:7a:ae:1c:9c  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:60:47:24:7a:a8
   vmhba1:C0:T2:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9c WWPN: 20:01:74:86:7a:ae:1c:9c  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:68:47:24:7a:a8
 
vmhba3:
 
   vmhba3:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:34 WWPN: 20:01:74:86:7a:ae:1c:34  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:60:47:24:7a:a8
   vmhba3:C0:T2:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:34 WWPN: 20:01:74:86:7a:ae:1c:34  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:68:47:24:7a:a8
 
vmhba1 and vmhba3 are zoned to the exact same array ports yet only vmhba1 is experiencing communication issues/errors.
 
 
Let’s look at the driver information under /proc/scsi/bfa/ by viewing (cat) the node information:
 
Chip Revision: Rev-E
Manufacturer: Brocade
Model Description: Brocade-1741
Instance Num: 0
Serial Num: xxxxxxxxx32
Firmware Version: 3.2.3.2
Hardware Version: Rev-E
Bios Version: 3.2.3.2
Optrom Version: 3.2.3.2
Port Count: 2
WWNN: 20:00:74:86:7a:ae:1c:9a
WWPN: 20:01:74:86:7a:ae:1c:9a
Instance num: 0
Target ID: 0 WWPN: 50:06:01:6b:47:20:7b:04
Target ID: 1 WWPN: 50:06:01:6b:47:20:7a:a8
Target ID: 2 WWPN: 50:06:01:63:47:20:7b:04
Target ID: 3 WWPN: 50:06:01:63:47:20:7a:a8
 
Chip Revision: Rev-E
Manufacturer: Brocade
Model Description: Brocade-1741
Instance Num: 1
Serial Num: xxxxxxxxx32
Firmware Version: 3.2.3.2
Hardware Version: Rev-E
Bios Version: 3.2.3.2
Optrom Version: 3.2.3.2
Port Count: 2
WWNN: 20:00:74:86:7a:ae:1c:9c
WWPN: 20:01:74:86:7a:ae:1c:9c
Instance num: 1
Target ID: 0 WWPN: 50:06:01:60:47:24:7b:04
Target ID: 1 WWPN: 50:06:01:68:47:24:7b:04
Target ID: 3 WWPN: 50:06:01:60:47:24:7a:a8
Target ID: 2 WWPN: 50:06:01:68:47:24:7a:a8
 
Chip Revision: Rev-E
Manufacturer: Brocade
Model Description: Brocade-1741
Instance Num: 2
Serial Num: xxxxxxxxx2E
Firmware Version: 3.2.3.2
Hardware Version: Rev-E
Bios Version: 3.2.3.2
Optrom Version: 3.2.3.2
Port Count: 2
WWNN: 20:00:74:86:7a:ae:1c:32
WWPN: 20:01:74:86:7a:ae:1c:32
Instance num: 2
Target ID: 0 WWPN: 50:06:01:6b:47:20:7b:04
Target ID: 1 WWPN: 50:06:01:6b:47:20:7a:a8
Target ID: 2 WWPN: 50:06:01:63:47:20:7b:04
Target ID: 3 WWPN: 50:06:01:63:47:20:7a:a8
 
Chip Revision: Rev-E
Manufacturer: Brocade
Model Description: Brocade-1741
Instance Num: 3
Serial Num: xxxxxxxxx2E
Firmware Version: 3.2.3.2
Hardware Version: Rev-E
Bios Version: 3.2.3.2
Optrom Version: 3.2.3.2
Port Count: 2
WWNN: 20:00:74:86:7a:ae:1c:34
WWPN: 20:01:74:86:7a:ae:1c:34
Instance num: 3
Target ID: 0 WWPN: 50:06:01:60:47:24:7b:04
Target ID: 1 WWPN: 50:06:01:68:47:24:7b:04
Target ID: 2 WWPN: 50:06:01:68:47:24:7a:a8
Target ID: 3 WWPN: 50:06:01:60:47:24:7a:a8
 
So all HBAs are the same firmware, which is important from a observed consistency perspective. Had the firmware versions been different then there might be something to go on, or at least verify whether there are issues with that firmware level. Obviously they are using the same driver as well since only one is loaded in the kernel.
 
We can see not only by the shared serial number above but also by the lspci output that these are 2 port physical cards:
 
# lspci
 
000:007:00.0 Serial bus controller: Brocade Communications Systems, Inc. Brocade-1010/1020/1007/1741 [vmhba0]
000:007:00.1 Serial bus controller: Brocade Communications Systems, Inc. Brocade-1010/1020/1007/1741 [vmhba1]
000:009:00.0 Serial bus controller: Brocade Communications Systems, Inc. Brocade-1010/1020/1007/1741 [vmhba2]
000:009:00.1 Serial bus controller: Brocade Communications Systems, Inc. Brocade-1010/1020/1007/1741 [vmhba3]
 
The first set of numbers are read as Domain:Bus:Slot.Function so vmhba0 and vmhba1 are both on Domain 0, Bus 7, Slot 0, amd function 0 and 1 respectively, which means it is a dual port HBA.
 
So vmhba0 and vmhba1 are the same physical card yet only vmhba1 is showing errors. The HBA chips themselves on a dual port HBA are mostly independent of each other so at least this means there isn’t a problem with the board or circuitry they both share. I say mostly since the physical ports are independent of each other as well as the HBA chip however they do share the same physical board and connection on the motherboard.
 
This host is running EMC PowerPath VE so we know that in general the I/O loads is evenly distributed across all HBAs and paths evenly. I say in general as PowerPath VE is intelligent enough to use paths that exhibit more errors than other paths less frequently, as well as paths that are more latent.
 
I believe we may be looking at either a cable issue (loose, faulty, or bad GBIC) between vmhba1 and the switch or the switch port itself that vmhba1 is connected to. Here is why:
 
1. vmhba1 is seeing thousands upon thousands of errors while the other HBAs are very quiet
2. vmhba1 and vmhba3 are zoned to the exact same targets yet only vmhba1 is seeing errors
3. vmhba0 and vmhba1 are the same physical card yet only vmhba 1 is seeing errors
 
My recommendation would be to check the physical switch port error counters and possibly replace the cable to see if the errors subside. It is standard practice to reset the switch counters and monitor to ensure errors are still happening so may be needed to do that to validate that the CRC errors or other fabric errors are still occurring.
 
Cheers,
Nathan (twitter account: @vSphereStorage)

Issues creating Desktops in VDI, and what to do about it

Connection Server rebootWe want to highlight some mitigation techniques and a handy KB article today for those of you managing Horizon View desktops. We’re talking about those occasions when no desktop can be created or recomposed in your vdi environment and no tasks submitted from Connection Brokers are acted upon by Virtual Center server.

Our Engineering team has been hard at work fixing many of the underlying causes of this happening and the latest release of View have all but eliminated these issues. However, if these issues show up in latest View releases, then we ask everyone to follow the specific steps in this KB: Pool settings are not saved, new pools cannot be created, and vCenter Server tasks are not processed in a Horizon View environment (2082413)

This KB contains several main steps, the first one of which is collecting the bundled logs from all connection brokers in the vdi environment and recording the time this issue was first observed. Steps 2 to 6 are basic steps that can potentially help address the issue but if issues persist, then step 7 requests opening a support case and submitting the collected bundle logs in step 1 alongside the recorded time when the issue was first observed. You might also include any other useful information, such as whether any recent changes were made to the environment.

When opening your support case, please note this KB article number in the description the case. That helps us get right on point ASAP.

Step 8 is what should address this issue without any connection broker reboot as it causes the stoppage of all View services in all View connection brokers and then restarting them.

If step 8 does not resolve your issue, then the last step (9) involves reboot of all connection serves and this has always addressed the issue in our experience.

Troubleshooting File Level Recovery with vSphere Data Protection – KBTV Webinars

This video is the second in a new series of free Webinars that we are releasing in which our Technical Support staff members present on various topics across a wide range of VMware’s product portfolio.

The title for this presentation is Troubleshooting File Level Recovery with vSphere Data Protection and it dives into some real world examples of how you can troubleshoot file level recovery issues with vSphere Data Protection.

This presentation was originally broadcast live on Thursday 5th March 2015.

To see the details of upcoming webinars in this series, see the Support Insider Blog post at New Free Webinars.

NOTE: This video is 17 minutes in length so it would be worth blocking out some time to watch it!

50 articles that fix EVERYTHING in Horizon View!

Ok, our title may exhibit a teenie-tiny hint of hyperbole, but seriously the following list of articles covers the majority of issues that you can solve yourself. We’ve posted these lists before but limited the number to twenty, but why do that? Your problem might be number twenty one.

  1. Manually deleting linked clones or stale virtual desktop entries from the View Composer database in VMware View Manager and Horizon View (2015112)
  2. Correlating VMware products build numbers to update levels (1014508)
  3. Generating and importing a signed SSL certificate into VMware Horizon View 5.1/5.2/5.3/6.0 using Microsoft Certreq (2032400)
  4. Pool settings are not saved, new pools cannot be created, and vCenter Server tasks are not processed in a Horizon View environment (2082413)
  5. VMware Products and CVE-2014-3566 (POODLE) (2092133)
  6. VMware Horizon View Best Practices (1020305)
  7. Network connectivity requirements for VMware View Manager 4.5 and later (1027217)
  8. Manually deleting replica virtual machines in VMware Horizon View 5.x (1008704)
  9. Finding and removing unused replica virtual machines in the VMware Horizon View (2009844)
  10. Restart order of the View environment to clear ADLDS (ADAM) synchronization in View 4.5, 4.6, 5.0, and 5.1 (2068381)
  11. Collecting diagnostic information for VMware Horizon View (1017939)
  12. View Connection Server reports the error: [ws_TomcatService] STDOUT: java.lang.OutOfMemoryError: Java heap space (2009877)
  13. Legacy applications fail to start with the VMware View 6.0 or 6.0.1 agent installed (2091845)
  14. Provisioning VMware Horizon View desktops fails with error: View Composer Agent initialization error (16): Failed to activate software license. (1026556)
  15. Cannot detach a Persistent Disk in View Manager 4.5 and later (2007076)
  16. Provisioning View desktops fails due to customization timeout errors (2007319)
  17. The View virtual machine is not accessible and the View Administration console shows the virtual machine status as “Already Used” (1000590)
  18. Forcing replication between ADAM databases (1021805)
  19. Administration dashboard in VMware Horizon View 5.1/5.2/5.3 reports the error: Server’s certificate cannot be checked (2000063)
  20. Troubleshooting SSL certificate issues in VMware Horizon View 5.1 and later (2082408)
  21. Removing a standard (replica) connection server or a security server from a cluster of connection/security servers (1010153)
  22. Troubleshooting Persona Management (2008457)
  23. View Persona Management features do not function when Windows Client-Side Caching is in effect. (2016416)
  24. Resolving licensing errors when deploying virtual Office to a system with Office installed natively. (2107369)
  25. Manually deleting linked clones or stale virtual desktop entries from VMware View Manager (1008658)
  26. Troubleshooting a black screen when logging into a Horizon View virtual desktop using PCoIP (1028332)
  27. The PCoIP server log reports the error: Error attaching to SVGADevTap, error 4000: EscapeFailed (1029706)
  28. Connecting to the View ADAM Database (2012377)
  29. Disabling SSLv3 connections over HTTPS to View Security Server and View Connection Server (2094442)
  30. The Event database performance in VMware View 6.0.x is extremely slow (2094580)
  31. Intermittent provisioning issues and generic errors when Composer and vCenter Server are co-installed (2105261)
  32. Performing an end-to-end backup and restore for VMware View Manager (1008046)
  33. Installing VMware View Agent or View Composer fails with the error: The system must be rebooted before installation can continue (1029288)
  34. View Connection Server fails to replicate (2014488)
  35. Provisioning View desktops fail with the error: View Composer Fault: VC operation exceeded the task timeout limit set by View Composer (2030047)
  36. Calculating datastore selection for linked-clone desktops in Horizon View 5.2 or later releases (2047492)
  37. VMware Horizon View Admin dashboard for vCenter Server 5.1 displays the message: VC service is not working properly (2050369)
  38. Generating a Horizon View SSL certificate request using the Microsoft Management Console (MMC) Certificates snap-in (2068666)
  39. Reconnecting to the VDI desktop with PCoIP displays a black screen (2073945)
  40. PCI Scan indicates that TCP Port 4172 PCoIP Secure Gateway is vulnerable to POODLE (CVE-2014-3566) (2099458)
  41. Troubleshooting USB redirection problems in VMware View Manager (1026991)
  42. Location of VMware View log files (1027744)
  43. Migrating linked clone pools to a different or new datastore (1028754)
  44. Error during provisioning: Unable to find folder (1038077)
  45. Unable to connect to a VMware View Manager desktop via the Security Server from outside the firewall (1039021)
  46. VMware View Agent fails to uninstall (2000017)
  47. View Manager Admin console displays the error: Error during provisioning: Unexpected VC fault from View Composer(Unknown)(2014321)
  48. Consolidating disks associated with a backup snapshot fails with the error: The file is already in use (2040846)
  49. Troubleshooting VMware Horizon View HTML Access (2046427)
  50. USB redirection may not work on cloned images after upgrading master image from VMware Horizon View 5.1 to 5.2 and 5.3 (2051801)

Horizon View PCoIP issues?

Here’s our latest top list of KB articles you should know about when encountering issues with PCoIP with Horizon View. It can be a tricky thing to configure and troubleshoot even for the best of us, so here’s some golden nuggets to help you on your way.

Troubleshooting Composer for VMware Horizon View

***UPDATE: We just published a great troubleshooting KB article for Composer that covers a lot of the common issues customers encounter here.

At some point, every View administrator who uses a linked clone pool is going to need to do some troubleshooting. Most linked clone troubleshooting involves a component called Composer.
What is composer, and what does it do?
Composer is an add-on for VMware Horizon View and is used to build linked clone desktops. Details about linked clones and Composer operations can be found in my previous posts, What is a linked clone? and part II of that topic.

Today we will focus on troubleshooting Composer when it breaks.

We are in the process of compiling a KB which will serve as the go-to article for Composer. This will contain links to important KBs, common issues, and procedures for repair. In the meantime, I thought this tactic would be good to share.

Compatibility

Compatibility is more important than many admins realize. VMware builds, tests, certifies and supports components that are built to work together.

The Connection Server talks to:

  • Composer
  • View Agent
  • vCenter
  • Security Server
  • and the clients

Composer in turn, talks to:

  • vCenter,
  • Connection Server
  • The hosts
  • Active Directory
  • The guest OS

As you can imagine, it’s easy for problems to balloon out of control if a component doesn’t talk properly to another. So, you need to ensure that every component is compatible and is designed to work with every other.

Here’s how-

Identifying where a problem exists is the first step to solving it. Composer can be nonfunctional because of factors that are entirely outside of Composer itself. For example, View doesn’t build desktops, that step is done by vCenter through API calls. If you are trying to build desktops and nothing is happening, it doesn’t mean View or Composer are at fault. vCenter needs to be fully functioning properly for View to be able to provision desktops. Along the same lines, Composer needs to be able to talk to all of the hosts in a cluster plus your Active Directory to be able to customize VMs, so if you have a dead host, Composer will fail.

Is Composer at fault then if it doesn’t work? Well, what about the guest VM? Does it get an IP address? Does it boot? If the answer to any of these is no, then Composer can’t do its job.

One of the tactics I take when Composer fails is to manually step though all of the processes involved.

  • Can I clone the base image?
  • Can I change customize it?
  • Can I activate it?
  • Is it network accessible?

If the answer to any of these questions is no, the problem is outside of Composer. Understanding where the linked clone process fails is the key to resolving problems.

Top 20 vSphere 5.5 Support Topics

Here’s our Top 20 vSphere 5.5 Knowledgebase articles for vSphere 5.5 and VMware Hypervisor 5.5

These KBs address the bulk of calls into our call centers for these products. See anything familiar in this list?

  1. VMware ESXi 5.x host experiences a purple diagnostic screen mentioning E1000PollRxRing and E1000DevRx
  2. Installing async drivers on ESXi 5.0, 5.1, and 5.5
  3. Determining Network/Storage firmware and driver version in ESXi/ESX 4.x and ESXi 5.x
  4. Collecting diagnostic information for VMware ESX/ESXi using the vSphere Client
  5. Re-pointing and re-registering VMware vCenter Server 5.1 / 5.5 and components
  6. Unmounting a LUN or detaching a datastore/storage device from multiple VMware ESXi 5.x hosts
  7. Upgrading to vCenter Server 5.5 best practices
  8. Installing or upgrading to ESXi 5.5 best practices
  9. Investigating virtual machine file locks on ESXi/ESX
  10. Creating a persistent scratch location for ESXi 4.x and 5.x
  11. Reducing the size of the vCenter Server database when the rollup scripts take a long time to run
  12. Broadcom 5719/5720 NICs using tg3 driver become unresponsive and stop traffic in vSphere
  13. Methods for upgrading to ESXi 5.5
  14. Permanent Device Loss (PDL) and All-Paths-Down (APD) in vSphere 5.x
  15. Installing vCenter Server 5.5 best practices
  16. Restarting the Management agents on an ESXi or ESX host
  17. Powering off a virtual machine on an ESXi host
  18. Migrating the vCenter Server database from SQL Express to full SQL Server
  19. Resetting the VMware vCenter Server 5.x Inventory Service database
  20. Methods of upgrading to vCenter Server 5.5

Using esxtop to identify storage performance issues in ESX/ESXi

Today we have a new vSphere video that demonstrates how to use esxtop to identify storage performance issues in a vSphere ESX / ESXi environment.

The esxtop utility and the latency statistics that it provides are very useful when troubleshooting performance issues with SAN-connected storage. Watch our latest video and learn!

For additional information see VMware Knowledge Base article Using esxtop to identify storage performance issues for ESX / ESXi (multiple versions) (1008205).

Uploading diagnostic information for VMware through the Secure FTP portal

VMware recently launched the new VMWARE Secure Data Transfer portal which offers the ability to upload diagnostic information and files to VMware in a safe and secure way.

To address a Support Request, VMware Technical Support may request diagnostic information from the affected VMware products. Our video today provides a demonstration of the procedures necessary to upload diagnostic information to VMware using the Secure FTP (sftpsite.vmware.com) portal.

Uploading diagnostic information to VMware using the Secure FTP portal includes these methods:

  • Using your web browser and the HTML Interface
  • Using your web browser and the Java Applet
  • Using the command-line from a Linux operating system
  • Using third-party clients

Notes

  • Internet Explorer 9 and above is supported.
  • Other supported browsers include Firefox, Chrome, and Safari.
  • When uploading with Internet Explorer 10/11, you may have to switch to compatibility mode.
  • Do not use the HTML interface to upload files larger than 2 GB.

This video is based on VMware Knowledge Base article Uploading diagnostic information for VMware through the Secure FTP portal (2069559).