Home > Blogs > Support Insider > Category Archives: Management

Category Archives: Management

User account locked in vCenter Server Appliance

vCSAWe’ve recently noticed a number of cases where vSphere administrators become locked out of their accounts or receive reports of incorrect passwords in the vCenter Server Appliance. If you find yourself in this position, here are two articles that address these issues:

KB 2034608
When attempting to log into the VMware vSphere 5.1, 5.5, or 6.0 Web Client you observe the following symptom: “User account is locked. Please contact your administrator.” This often occurs if the wrong password was entered multiple times. Waiting the default 15 minutes lockout period will allow to attempt the login again. If after multiple attempts, you are still not successful, you may need to reset the password.

KB 2069041
When attempting to log into the vCenter Server 5.5 and 6.0 Appliance, you experience symptoms where the root account is locked out. This often occurs because the vCenter Server appliance has a default 90 password expiration policy. Steps on how to modify the password expiration policies and to unlock the password.

Host disconnected from vCenter and VMs showing as inaccessible

Another deep-dive troubleshooting blog today from Nathan Small (twitter account: @vSphereStorage)
 
Description from customer:
 
Host is getting disconnected from vCenter and VMs are showing as inaccessible. Only one host is affected.
 
 
Analysis:
 
A quick review of the vmkernel log shows a log spew of H:0x7 errors to numerous LUNs. Here is a short snippet where you can see how frequently they are occurring (multiple times per second):
 
# cat /var/log/vmkernel.log
 
2016-01-13T18:54:42.994Z cpu68:8260)ScsiDeviceIO: 2326: Cmd(0x412540b96e80) 0x28, CmdSN 0x8000006b from world 11725 to dev “naa.600601601b703400a4f90c3d0668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:43.027Z cpu68:8260)ScsiDeviceIO: 2326: Cmd(0x4125401b2580) 0x28, CmdSN 0x8000002e from world 11725 to dev “naa.600601601b70340064a24ada10fae211” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:43.030Z cpu68:8260)ScsiDeviceIO: 2326: Cmd(0x4125406d5380) 0x28, CmdSN 0x80000016 from world 11725 to dev “naa.600601601b7034000c70e4e610fae211” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:43.542Z cpu67:8259)ScsiDeviceIO: 2326: Cmd(0x412540748800) 0x28, CmdSN 0x80000045 from world 11725 to dev “naa.600601601b70340064a24ada10fae211” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:43.808Z cpu74:8266)ScsiDeviceIO: 2326: Cmd(0x412541229040) 0x28, CmdSN 0x8000003c from world 11725 to dev “naa.600601601b7034008e56670a11fae211” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:44.088Z cpu38:8230)ScsiDeviceIO: 2326: Cmd(0x4124c0ff4f80) 0x28, CmdSN 0x80000030 from world 11701 to dev “naa.600601601b703400220f77ab15fae211” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:44.180Z cpu74:8266)ScsiDeviceIO: 2326: Cmd(0x412540ccda80) 0x28, CmdSN 0x80000047 from world 11725 to dev “naa.600601601b70340042b582440668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:44.741Z cpu61:8253)ScsiDeviceIO: 2326: Cmd(0x412540b94480) 0x28, CmdSN 0x80000051 from world 11725 to dev “naa.600601601b70340060918f5b0668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:44.897Z cpu63:8255)ScsiDeviceIO: 2326: Cmd(0x412540ff3180) 0x28, CmdSN 0x8000007a from world 11725 to dev “naa.600601601b7034005c918f5b0668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:45.355Z cpu78:8270)ScsiDeviceIO: 2326: Cmd(0x412540f3b2c0) 0x28, CmdSN 0x80000039 from world 11725 to dev “naa.600601601b70340060918f5b0668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:45.522Z cpu70:8262)ScsiDeviceIO: 2326: Cmd(0x41254073d0c0) 0x28, CmdSN 0x8000002c from world 11725 to dev “naa.600601601b7034000e3e97350668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:45.584Z cpu71:8263)ScsiDeviceIO: 2326: Cmd(0x412541021780) 0x28, CmdSN 0x80000067 from world 11725 to dev “naa.600601601b7034000e3e97350668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:45.803Z cpu63:8255)ScsiDeviceIO: 2326: Cmd(0x412540d20480) 0x28, CmdSN 0x80000019 from world 11725 to dev “naa.600601601b703400d24fc7620668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-01-13T18:54:46.253Z cpu74:8266)ScsiDeviceIO: 2326: Cmd(0x412540b96380) 0x28, CmdSN 0x8000006f from world 11725 to dev “naa.600601601b7034005e918f5b0668e311” failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
 
The Host side error (H:0x7) literally translates to Storage Initiator Error, which makes it sounds like there is something physical wrong with the card. One needs to understand that this status is sent up the stack from the HBA driver so really it is up to the those that write the driver to use this status for certain conditions. As there are no accompanying errors from the HBA driver, which in this case is a Brocade HBA, this is all we have to work with without enabling verbose logging in the driver. Verbose logging requires a reboot so this is not always an option when investigating root cause. The exception would be that the issue in ongoing so rebooting a host to capture this data is a viable option.
 
Taking a LUN as an example from ‘esxcfg-mpath -b’ output to get a view of the paths and targets:
 
# esxcfg-mpath -b
 
naa.600601601b703400b6aa124c0668e311 : DGC Fibre Channel Disk (naa.600601601b703400b6aa124c0668e311)
   vmhba0:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9a WWPN: 20:01:74:86:7a:ae:1c:9a  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:63:47:20:7a:a8
   vmhba1:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9c WWPN: 20:01:74:86:7a:ae:1c:9c  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:60:47:24:7a:a8
   vmhba0:C0:T1:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9a WWPN: 20:01:74:86:7a:ae:1c:9a  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:6b:47:20:7a:a8
   vmhba1:C0:T2:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9c WWPN: 20:01:74:86:7a:ae:1c:9c  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:68:47:24:7a:a8
   vmhba2:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:32 WWPN: 20:01:74:86:7a:ae:1c:32  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:63:47:20:7a:a8
   vmhba3:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:34 WWPN: 20:01:74:86:7a:ae:1c:34  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:60:47:24:7a:a8
   vmhba2:C0:T1:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:32 WWPN: 20:01:74:86:7a:ae:1c:32  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:6b:47:20:7a:a8
   vmhba3:C0:T2:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:34 WWPN: 20:01:74:86:7a:ae:1c:34  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:68:47:24:7a:a8
 
Let’s look at the adapter statistics for all HBAs. I would recommend always using localcli over esxcli when troubleshoot as esxcli requires hostd to be functioning properly:
 
# localcli storage core adapter stats get
 
vmhba0:
   Successful Commands: 844542177
   Blocks Read: 243114868277
   Blocks Written: 25821448417
  Read Operations: 395494703
   Write Operations: 405753901
   Reserve Operations: 0
   Reservation Conflicts: 0
   Failed Commands: 35403
   Failed Blocks Read: 57744
   Failed Blocks Written: 16843
   Failed Read Operations: 8224
   Failed Write Operations: 16450
   Failed Reserve Operations: 0
   Total Splits: 0
   PAE Commands: 0
 
vmhba1:
   Successful Commands: 502595840 <– Far less successful commands than the other adapters
   Blocks Read: 116436597821
   Blocks Written: 16509939615
   Read Operations: 216572537
   Write Operations: 245276523
   Reserve Operations: 0
   Reservation Conflicts: 0
   Failed Commands: 10942696
   Failed Blocks Read: 12055379188 <– 12 billion failed blocks read! Other adapters are all less than 60,000
   Failed Blocks Written: 933809
   Failed Read Operations: 10895926
   Failed Write Operations: 25645
   Failed Reserve Operations: 0
   Total Splits: 0
   PAE Commands: 0
 
vmhba2:
   Successful Commands: 845976973
   Blocks Read: 244034940187
   Blocks Written: 26063852941
   Read Operations: 397564994
   Write Operations: 407538414
   Reserve Operations: 0
   Reservation Conflicts: 0
   Failed Commands: 40468
   Failed Blocks Read: 44157
   Failed Blocks Written: 18676
   Failed Read Operations: 5506
   Failed Write Operations: 12152
   Failed Reserve Operations: 0
   Total Splits: 0
   PAE Commands: 0
 
vmhba3:
   Successful Commands: 866718515
   Blocks Read: 249837164491
   Blocks Written: 26492209531
   Read Operations: 406367844
   Write Operations: 416901703
   Reserve Operations: 0
   Reservation Conflicts: 0
   Failed Commands: 37723
   Failed Blocks Read: 23191
   Failed Blocks Written: 139380
   Failed Read Operations: 7372
   Failed Write Operations: 14878
   Failed Reserve Operations: 0
   Total Splits: 0
   PAE Commands: 0
 
 
Let’s see how often the vmkernel.log reports messages for that HBA:
 
# cat vmkernel.log |grep vmhba0|wc -l
112
 
# cat vmkernel.log |grep vmhba1|wc -l
8474 <– over 8000 times this HBA is mentioned! This doesn’t mean they are all errors, of course, but based on the log spew we know is already occurring it means it likely is
 
# cat vmkernel.log |grep vmhba2|wc -l
222
 
# cat vmkernel.log |grep vmhba3|wc -l
335
 
Now let’s take a look at the zoning to see if multiple adapters are zoned to the exact same array targets (WWPN) in attempt to determine if the issue is possibly array side or HBA side:
 
# esxcfg-mpath -b
 
naa.600601601b703400b6aa124c0668e311 : DGC Fibre Channel Disk (naa.600601601b703400b6aa124c0668e311)
   vmhba0:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9a WWPN: 20:01:74:86:7a:ae:1c:9a  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:63:47:20:7a:a8
   vmhba1:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9c WWPN: 20:01:74:86:7a:ae:1c:9c  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:60:47:24:7a:a8
   vmhba0:C0:T1:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9a WWPN: 20:01:74:86:7a:ae:1c:9a  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:6b:47:20:7a:a8
   vmhba1:C0:T2:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9c WWPN: 20:01:74:86:7a:ae:1c:9c  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:68:47:24:7a:a8
   vmhba2:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:32 WWPN: 20:01:74:86:7a:ae:1c:32  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:63:47:20:7a:a8
   vmhba3:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:34 WWPN: 20:01:74:86:7a:ae:1c:34  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:60:47:24:7a:a8
   vmhba2:C0:T1:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:32 WWPN: 20:01:74:86:7a:ae:1c:32  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:6b:47:20:7a:a8
   vmhba3:C0:T2:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:34 WWPN: 20:01:74:86:7a:ae:1c:34  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:68:47:24:7a:a8
 
Let’s isolate the HBAs so they are easier to visually compare the WWPN of the array targets:
 
vmhba1:
 
   vmhba1:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9c WWPN: 20:01:74:86:7a:ae:1c:9c  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:60:47:24:7a:a8
   vmhba1:C0:T2:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:9c WWPN: 20:01:74:86:7a:ae:1c:9c  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:68:47:24:7a:a8
 
vmhba3:
 
   vmhba3:C0:T3:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:34 WWPN: 20:01:74:86:7a:ae:1c:34  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:60:47:24:7a:a8
   vmhba3:C0:T2:L20 LUN:20 state:active fc Adapter: WWNN: 20:00:74:86:7a:ae:1c:34 WWPN: 20:01:74:86:7a:ae:1c:34  Target: WWNN: 50:06:01:60:c7:20:7a:a8 WWPN: 50:06:01:68:47:24:7a:a8
 
vmhba1 and vmhba3 are zoned to the exact same array ports yet only vmhba1 is experiencing communication issues/errors.
 
 
Let’s look at the driver information under /proc/scsi/bfa/ by viewing (cat) the node information:
 
Chip Revision: Rev-E
Manufacturer: Brocade
Model Description: Brocade-1741
Instance Num: 0
Serial Num: xxxxxxxxx32
Firmware Version: 3.2.3.2
Hardware Version: Rev-E
Bios Version: 3.2.3.2
Optrom Version: 3.2.3.2
Port Count: 2
WWNN: 20:00:74:86:7a:ae:1c:9a
WWPN: 20:01:74:86:7a:ae:1c:9a
Instance num: 0
Target ID: 0 WWPN: 50:06:01:6b:47:20:7b:04
Target ID: 1 WWPN: 50:06:01:6b:47:20:7a:a8
Target ID: 2 WWPN: 50:06:01:63:47:20:7b:04
Target ID: 3 WWPN: 50:06:01:63:47:20:7a:a8
 
Chip Revision: Rev-E
Manufacturer: Brocade
Model Description: Brocade-1741
Instance Num: 1
Serial Num: xxxxxxxxx32
Firmware Version: 3.2.3.2
Hardware Version: Rev-E
Bios Version: 3.2.3.2
Optrom Version: 3.2.3.2
Port Count: 2
WWNN: 20:00:74:86:7a:ae:1c:9c
WWPN: 20:01:74:86:7a:ae:1c:9c
Instance num: 1
Target ID: 0 WWPN: 50:06:01:60:47:24:7b:04
Target ID: 1 WWPN: 50:06:01:68:47:24:7b:04
Target ID: 3 WWPN: 50:06:01:60:47:24:7a:a8
Target ID: 2 WWPN: 50:06:01:68:47:24:7a:a8
 
Chip Revision: Rev-E
Manufacturer: Brocade
Model Description: Brocade-1741
Instance Num: 2
Serial Num: xxxxxxxxx2E
Firmware Version: 3.2.3.2
Hardware Version: Rev-E
Bios Version: 3.2.3.2
Optrom Version: 3.2.3.2
Port Count: 2
WWNN: 20:00:74:86:7a:ae:1c:32
WWPN: 20:01:74:86:7a:ae:1c:32
Instance num: 2
Target ID: 0 WWPN: 50:06:01:6b:47:20:7b:04
Target ID: 1 WWPN: 50:06:01:6b:47:20:7a:a8
Target ID: 2 WWPN: 50:06:01:63:47:20:7b:04
Target ID: 3 WWPN: 50:06:01:63:47:20:7a:a8
 
Chip Revision: Rev-E
Manufacturer: Brocade
Model Description: Brocade-1741
Instance Num: 3
Serial Num: xxxxxxxxx2E
Firmware Version: 3.2.3.2
Hardware Version: Rev-E
Bios Version: 3.2.3.2
Optrom Version: 3.2.3.2
Port Count: 2
WWNN: 20:00:74:86:7a:ae:1c:34
WWPN: 20:01:74:86:7a:ae:1c:34
Instance num: 3
Target ID: 0 WWPN: 50:06:01:60:47:24:7b:04
Target ID: 1 WWPN: 50:06:01:68:47:24:7b:04
Target ID: 2 WWPN: 50:06:01:68:47:24:7a:a8
Target ID: 3 WWPN: 50:06:01:60:47:24:7a:a8
 
So all HBAs are the same firmware, which is important from a observed consistency perspective. Had the firmware versions been different then there might be something to go on, or at least verify whether there are issues with that firmware level. Obviously they are using the same driver as well since only one is loaded in the kernel.
 
We can see not only by the shared serial number above but also by the lspci output that these are 2 port physical cards:
 
# lspci
 
000:007:00.0 Serial bus controller: Brocade Communications Systems, Inc. Brocade-1010/1020/1007/1741 [vmhba0]
000:007:00.1 Serial bus controller: Brocade Communications Systems, Inc. Brocade-1010/1020/1007/1741 [vmhba1]
000:009:00.0 Serial bus controller: Brocade Communications Systems, Inc. Brocade-1010/1020/1007/1741 [vmhba2]
000:009:00.1 Serial bus controller: Brocade Communications Systems, Inc. Brocade-1010/1020/1007/1741 [vmhba3]
 
The first set of numbers are read as Domain:Bus:Slot.Function so vmhba0 and vmhba1 are both on Domain 0, Bus 7, Slot 0, amd function 0 and 1 respectively, which means it is a dual port HBA.
 
So vmhba0 and vmhba1 are the same physical card yet only vmhba1 is showing errors. The HBA chips themselves on a dual port HBA are mostly independent of each other so at least this means there isn’t a problem with the board or circuitry they both share. I say mostly since the physical ports are independent of each other as well as the HBA chip however they do share the same physical board and connection on the motherboard.
 
This host is running EMC PowerPath VE so we know that in general the I/O loads is evenly distributed across all HBAs and paths evenly. I say in general as PowerPath VE is intelligent enough to use paths that exhibit more errors than other paths less frequently, as well as paths that are more latent.
 
I believe we may be looking at either a cable issue (loose, faulty, or bad GBIC) between vmhba1 and the switch or the switch port itself that vmhba1 is connected to. Here is why:
 
1. vmhba1 is seeing thousands upon thousands of errors while the other HBAs are very quiet
2. vmhba1 and vmhba3 are zoned to the exact same targets yet only vmhba1 is seeing errors
3. vmhba0 and vmhba1 are the same physical card yet only vmhba 1 is seeing errors
 
My recommendation would be to check the physical switch port error counters and possibly replace the cable to see if the errors subside. It is standard practice to reset the switch counters and monitor to ensure errors are still happening so may be needed to do that to validate that the CRC errors or other fabric errors are still occurring.
 
Cheers,
Nathan (twitter account: @vSphereStorage)

vRealize Operations Manager 6.1.0 – new KB content

Exciting news for users of vRealize Operations Manager 6.1.0. We’ve been busy updating and creating new Knowledgebase articles to support the product. Enjoy!

New:

Updated:

Unable to access Administration and Licensing in vSphere Web Client

After upgrading or deploy the vCenter Server 6.0 or vCenter Server Appliance 6.0, some customers are being denied access to some features. Namely-

  • You are unable to access or see the Single Sign-On administration section
  • When attempting to access the System Configuration (located at: Administration > System Configuration) section, you are presented with:
    You do not have permissions to view this page. You must be a member of the SystemConfiguration.Administrators group in vCenter Single Sign-On to access System Configuration.
  • When attempting to access the Licensing administration (located at: Administration > Licensing > Licenses or Administration > Licensing > Reports) section, you are presented with:
    To view and manage licenses, you must have the Global.Licenses privilege on the vCenter Server system where this vSphere Web Client runs.

If you are encountering any of these messages you will want to check out this recent KB article:
Unable to access new Administration and Licensing features in the vSphere Web Client 6.0 (2120255)

Administration and Licensing in vSphere Web Client

ALERT: When removing CPU from VM configuration hard disks and nic are removed

VMware Support AlertVMware has become aware of situation whereby hard disks and/or nics may be removed from a virtual machine when reconfiguring VM workflows in vRealize Automation.

We have identified two distinct scenarios where this might happen and so have create two separate KB articles. Please familiarize yourself with both articles so that you can avoid these situations.

  1. When performing reconfigure operations in multiple browser tabs In VMware vRealize Automation, the hard disks and NICs are removed unexpectedly (2124198)
  2. In VMware vRealize Automation, the hard disks are removed unexpectedly when reconfiguring a virtual machine that has RDM disks (2124657)

Any updates to this information will be made in the KB articles themselves. The Additional Information section details how to receive these updates.

Fresh vSphere 6 KB articles!

vSphere 6.0 has been out now for a few weeks and you early adopters have been busy kicking the tires. We’ve heard some very encouraging things about this release ie: the web client improvements. It’s always interesting and top of mind for us to see what issues emerge in everyone’s environments and we monitor support requests coming into support as well as social media to see what customers run into.

Here’s an fresh list of Knowledgebase articles we’ve created to address some of these inquiries. Familiarize yourself with the list and of course share with your colleagues using the buttons on this page.

Database compatibility issues during upgrade

Deprecated VMFS volume errors

Backup failures/CBT mem heap issues

Replace certificates for vSphere 6.0

Decommissioning a vCenter Server or Platform Services Controller

Troubleshooting when virtual machine operations are greyed out

Hello, I am a Technical Support Engineer at VMware. I would like to talk about an issue I worked on recently with a customer that may come up for other users. The Support Request (SR) came to me with this description: “The Virtual Machine options are not working”. I had to ask myself, “What do you mean by ‘not working’”? I quickly called the customer and we got a remote session going so that I could see for myself what the problem was.

The problem was with only one Virtual Machine in that vCenter instance. I found that the virtual machine options were disabled (greyed-out) when I right-clicked the problematic virtual machine.

Note: Click an image to see the full-sized version.

The customer at this point mentioned that he had tried to take a backup using third-party software. He saw this error message in the “Recent Tasks” pane:

Error: Another task in progress

My initial hunch was that there was a job running for the virtual machine in the background, but I needed to verify it wasn’t a permissions problem.

Permissions are not set correctly

First I checked the permissions given to that virtual machine. The customer had a very small environment, so I did not spend too much time checking the complete permissions given for the entire vSphere environment. Instead, I went through the following steps:

Note: If your environment is large and you have multiple vSphere administrators with different permissions, permissions on a particular virtual machine machine might be incorrect/missing. In this case, the vSphere root administrator needs to ensure that there are sufficient permissions given for you to administer the virtual machine, at all levels.

  1. Check the Permissions tab for the virtual machine, to make sure that your name is listed there with the proper permission assigned. If your name is not there, but an AD/Local group is listed, then make sure that your name is added to the AD/Local group.Here, the user Testuser has the Virtual Machine Administrator role:
  2. If the permissions are defined at the host, cluster, Datacenter, or vCenter level:
      1. Apply the required permissions to the user/group.
      2. Make sure that the permission is propagated to all the objects.
      3. Make sure that your name is listed in the local/AD group.Here, the group Virtual Machine Administrator has the Virtual Machine Administrator role:

  3. Roles can also be assigned in the folder level. Go to the VMs and Templates view and make sure the folder where the Virtual Machine located has all required privileges assigned.In the below example, VM1,VM2 and VM3 are located in the folder Test. The Virtual Machine Administrator group is assigned with “Virtual Machine Administrator” role.
    In the examples above, the role “Virtual Machine Administrator” is not a default role, it is created with privileges to Administrate Virtual Machines. To know what privileges are required, I always refer to the Required Privileges for Common Tasks section in: Virtual Machine Administration Guide.

With that set of checks, we can now go on to the next step, and this comes back to my hunch because of the error message which I noticed in the “Recent Task”.

Tasks Running in the Background

We should always be aware that certain jobs for any given Virtual Machine may be running in the background which will not allow any changes/operations in the Virtual Machine. These background jobs cannot be seen in the “Task and Event” tab in the vCenter server or in the “Recent Tasks”.

The method we look for these background jobs is by logging into the host via SSH where the problem Virtual Machine is running. Here are the steps to find any running jobs which cannot be seen in the vSphere client.

1. Log into the ESX/ESXi host using a SSH client.
2. Run the following command to list all the Virtual Machines registered in the host. We are running this specifically to find the vmid for the problem Virtual Machine.

vim-cmd vmsvc/getallvms

The output appears as:

3. Check the tasks running on the Virtual Machine by running the command:

vim-cmd vmsvc/get.tasklist vmid

In our example, we run the command against the Virtual Machine VM3. It indicates there are 2 jobs/tasks running on that Virtual Machine.

The output shows as below if there is no jobs/tasks running.

4. Now you you have a choice to either wait for the jobs to complete, or restart the Management agent to terminate this job.

In this particular support case I was working we found a snapshot job that was triggered by the backup agent running in the Virtual Machine. I stopped the job by restarting the management agent on that host. For this customer, this solved the problem, but there is one more reason this might happen. I had a word with one my colleagues, and he pointed out that you could also encounter this if you have an invalid entry in your VMX file. Let’s go one step further and show you how to check this.

Invalid entries in VMX file

This type of issue might happen if a vmx file has invalid parameters or blank lines in it. You can resolve this issue by manually removing the invalid arguments or deleting the blank lines.

Caution: If incorrectly done, your virtual machine may fail to start or operate incorrectly. Always take a backup of your .vmx file before modifying it.

  1. Open the .vmx file using any text editor.
  2. Search for any blank lines and delete them.
    Note
    : To delete a single line using vi editor, press d twice.
  3. Compare the .vmx file with a working virtual machine .vmx file and see if there are any invalid arguments.
  4. To apply the changes, reload the .vmx file by running the command:
vim-cmd vmsvc/reload vmid

Note: The Default location of the vmx file is:

Vmfs/volumes/name_of_the_datastore/vm_name/vm_name.vmx

Summary

Symptom:
Virtual machine operations are grayed out

Possible Causes:

  • Permissions are not set correctly
  • Tasks Running in the Background
  • vmx file is corrupted

For further information on the steps used in this article, refer to: Troubleshooting when virtual machine options are grayed out in vSphere Client (2048748)

Hope this blog post helps you out a little bit. Have a great day!

New Network port diagram for vSphere 5.x

Over the past few weeks we have been working on constructing a brand new network diagram, depicting ports in use for vSphere 5.x

These diagrams have been very popular in the past and we hope you like this one too! We created Knowledgebase article: Network port diagram for vSphere 5.x (2054806) as a container for the pdf diagram. The pdf also lists all of the ports used in tabular format.

If you’d like to see more of these, tell us in the comments section below!

Network port diagram for vSphere 5

Alternate download location.

Note: This information provided is on a best effort basis. VMware will endeavor to update the diagram as new releases come out.

Physical or Appliance – Upgrading to vCenter Server 5.1

The other day we received this question from a customer via Twitter:

@VMwareCares planning to upgrade to 5.1 from 5.0 vcenter. What’s recommended physical or appliance? Ups and downs side of each?

We thought a few more of you might have the same questions so we decided we would take the opportunity to explain the differences between vCenter server and vCenter appliance and under what situation which one should be opted for, over the other.

The vCenter Server Appliance (vCSA) is a preconfigured Linux-based virtual machine optimized for running vCenter Server and associated services. Versions 5.0.1 and 5.1 of the vCSA uses PostgreSQL for the embedded database instead of IBM DB2, which was used in vCenter Server Appliance 5.0 The vCSA embedded postSQL DB supports 5 hosts / 50 virtual machines, with an Oracle DB the vCSA can support 1000 hosts and 10,000 vms. If you configure your vCSA to use an external instance of Single Sign On (SSO), the external SSO instance must be hosted on another vCenter Server Appliance; it cannot be hosted on a Windows machine.

vCenter Server can be installed on a windows Guest OS and can be connected to Oracle or Microsoft SQL. SSO can be installed on the same Guest OS or can be on a different machine. It should be noted that patching of of the vCenter Appliance is not supported.

Below is a table listing more of the differences between the products.

Features vCenter Server vCenter Server Appliance
Guest OS Any Supported Guest OS Preconfigured Linux-based virtual machine (64-bit SUSE Linux Enterprise Server 11)
Database Supported Versions SQL Server and Oracle. PostgreSQL (built-in ) can have 5 hosts and 50 Virtual Machines.Supported External Oracle database.
System Requirement 2 vCPU and 4 GB RAM 2 vCPU and 4 GB RAM
Platform Physical or virtual machine Virtual Appliance
Installation Using binary provided in .zip or .ISO Deploying OVF
Update Manager Can be installed on same vCenter Server or on separate Guest OS. Separate install
Single Sign On (SSO) Can be installed on same vCenter Server or separate Guest OS. Pre-installed.
Networking IPv6 and IPv4 Support IPv4 Support
Linked Mode Supported Not Supported
SRM (Site Recovery Manager) Compatible with SRM Compatible with SRM
vSphere Web Client Can be installed on same vCenter server or separate machine. Pre-Installed.
Syslog Server Can be installed on vCenter Server or separate server and configured using plug-in. Pre-installed and does not have plug-in.
ESXi Dump collector Can be installed on vCenter Server or on a separate Guest OS. Pre-installed and does not have plug-in.
Multi-site SSO Supported Not Supported. Basic SSO only.
VSA (vSphere Storage Appliance Supported Not Supported
VMware View Supported Not Supported

SSL Certificate Automation Tool version 1.0.1

Last month we announced a new SSL Certificate Automation tool to help everyone with the implementation of custom certificates. Yesterday, we released the second version of it (version 1.0.1). This is a minor update which aims to simplify the replacement of certificates further by adding Certificate Signing Request (CSR) functionality to the tool. This functionality allows a user to quickly generate certificate requests (and consequently the private keys) for submission to the Certificate Authority.  The CSR functionality was the largest portion of manual steps, and as a result the update reduces the number of steps by over 15.

In addition, there are several minor bug fixes which were fixed which impacted tool functionality.

For further details and to download the latest version of the SSL tool see: Deploying and Using the SSL Certificate Automation Tool (2041600)

We hope these additions provide useful for everyone!