I recently came across a situation where I was unable to login to my ESXi host as root. This caught me off guard as I hadn’t intentionally disabled root, but suddenly, and seemingly out of the blue, root logins stopped working.
Now just before this happened I had used the host to record a video showing how lockdown mode works. So I knew the problem must somehow be related to the host having been placed into lockdown mode and subsequently taken out. After a bit of testing, sure enough I confirmed that this was the case. The process of putting my host into lockdown mode and subsequently taking it out had unexpectedly removed the root privileges from the host. Why did this happen? Well, the answer is tied to how lockdown mode works and more specifically the role of the DCUI.Access list in allowing select users to override lockdown mode.
Lets start with a quick review of what lockdown mode is. Lockdown mode is a security setting used to disable direct user access to a host. When you disable direct user access you require the host be managed from the vCenter Server. This ensures that the security policies and access controls defined on the vCenter server are always enforced; users aren't able to bypass vCenter security by logging into the host directly.
Of course, when running a host in lockdown mode you need to consider what happens if the host ever becomes isolated from the vCenter Server? Should a host lose network connectivity you obviously wouldn’t be able to manage the host from vCenter, and with direct user access disabled you would essentially become “locked out”. To avoid getting “locked out” we use the host setting “DCUI.Access”. The DCUI.Access is a list of trusted users who are allowed to override the lockdown mode. By default the list includes just one user, and that is “root”.
Now lets get back to my situation where the root user was unexpectedly disabled. First, lets start by looking at the user permissions on the host before I enabled lockdown mode.
Here we see the “root” user, the “admin02” user and the special “vpxuser” and “dcui” users. In addition we see the “ESX Admins” group, which was created when I joined the host to my Active Directory domain. Note that all the accounts have been assigned the administrator role.
Before putting the host into lockdown mode I edited the DCUI.Access list and replaced the default “root” user with “admin02”. The reason I did this is that I wanted to show that I could use an account other than "root" to override the lockdown mode in my video.
Next, I put the host into lockdown mode. When a host is placed into lockdown mode, the local user privileges for the non-system generated accounts (i.e. accounts other than vpxuser and DCUI) get removed. The affect of doing this is that local users can no longer access the host locally and must manage the host from vCenter.
With my host in lockdown mode the only local user that is allowed to login and override the lockdown mode is “admin02”, because that's the only user I had listed in the DCUI.Access list. So I then logged into the DCUI and used the "admin02" account to login and disable lockdown mode.
After I logged in as “admin02” and took the host out of lockdown mode the admin privileges for the “admin02” account were restored, as were the privileges for the “ESX Admins” Active Directory group. However, the privileges for the “root” account, which had been removed when I put the host into lockdown mode, were not restored.
The reason why is when a host is taken out of lockdown mode, the admin privileges are only restored for those accounts listed in the DCUI.Access list. By removing the “root” account from the DCUI.Access list, placing the host into lockdown mode, and then taking the host out of lockdown mode, I unintentionally removed root access to the host. It wasn’t what I expected, but now that I understand what’s happening under the covers it make sense why it happened.
Understanding how this works, my recommendation is to always leave the "root" account in the DCUI.Access list. While I do advocate limiting the use of a shared "root" account, I don't advise disabling or removing this user.
Is this a bug? Well, technically nothing is broke and I was able to quickly restore “root” access by simply adding the “root” account back to the DCUI.Access list bouncing the host in and out of maintenance mode again. However, this behavior was unexpected. Perhaps it’s an education issue that just needs better documentation? Maybe we should add a cautionary note in the UI warning you that if you remove the “root” user from the DCUI.Access you are effectively disabling root access to the host? Maybe we should do a combination of the two? What do you think?
Follow me on Twitter @VMwareESXi.