“It’s a Unix system, I know this!”

Every fellow geek who first saw Jurassic Park twenty years ago (Has it really been that long??) cringed when Lex Murphy sat down at a Silicon Graphics workstation and exclaimed the line above. I’m reminded of this line all the time when I talk to some customers who I find treat their ESXi systems like they would a Unix or Linux system. I’m here to tell you, it’s not.

A shell does not an OS make

Did you know you can run a Unix bash shell on Windows? Heck, you can even run a Unix bash shell on OpenVMS! Neither of them are Unix systems, obviously! And neither is ESXi.

Logging into an ESXi shell, whether via SSH or via the local console using ALT-F1, brings you into a Unix-like shell.

Lots of familiar commands like grep and ls work great. But what you are logged into is a “BusyBox” shell. And underneath, you don’t have a Linux kernel, you have a VMware vmkernel and part of the vmkernel is this amazing API set that allows you to do things more efficiently and with better security control.

DISA STIG? Is this TopGear for Geeks?

What prompted this blog post was an email I got a while back. Someone at VMware was working with a Federal customer who was trying to understand the DISA STIG suggestions for ESXi. What’s a “DISA” and what’s a “STIG” you ask? DISA is the US government’s Defense Information Systems Agency. A STIG is a Security Technical Implementation Guide. All federal systems must conform to these guides. I looked through the STIG and found a number of concerns.

My reason for using the DISA STIG documents is not to call out the DISA folks who have an huge responsibility of protecting the systems our government depends upon, but because the documents are publicly available. It’s kind of hard to get similar documents from corporate enterprises!

After digging into the STIG, I found a couple of controls and the recommended fixes that really entertained me. My favorite is disabling copy/paste to/from virtual machine consoles. There’s good reasons for disabling these settings. You may not want people pasting code into the console of a virtualized server. However the steps laid out, in my opinion, make things even worse from a security standpoint. Here is the text of the “fix”.

To edit a powered-down virtual machine’s .vmx file, first remove it from vCenter Server’s inventory. Manual additions to the .vmx file from ESXi will be overwritten by any registered entries stored in the vCenter Server database. Make a backup copy of the .vmx file. If the edit breaks the virtual machine, it can be rolled back to the original version of the file.

Open the vSphere/VMware Infrastructure (VI) Client and log in with appropriate credentials.
If connecting to vCenter Server, click on the desired host.
Click the Configuration tab.
Click Storage.
Right-click on the appropriate datastore and click Browse Datastore.
Navigate to the folder named after the virtual machine, and locate the <virtual machine>.vmx file.
Right-click the .vmx file and click Remove from inventory.

Temporarily disable Lockdown Mode and enable the ESXi Shell via the vSphere Client.

Open the vSphere/VMware Infrastructure (VI) Client and log in with appropriate credentials.
If connecting to vCenter Server, click on the desired host.
Click the Configuration tab.
Click Software, Security Profile, Services, Properties, ESXi Shell, and Options, respectively.
Start the ESXi Shell service, where/as required.

As root, log in to the ESXi host and locate the VM’s vmx file.
# find / | grep vmx

Add the following to the VM’s vmx file.
keyword = “keyval”

Where:
keyword = isolation.tools.copy.disable
keyval = TRUE

Re-enable Lockdown Mode on the host.

Re-register the VM with the vCenter Server:
Open the vSphere/VMware Infrastructure (VI) Client and log in with appropriate credentials.
If connecting to vCenter Server, click on the desired host.
Click the Configuration tab.
Click Storage.
Right-click on the appropriate datastore and click Browse Datastore.
Navigate to the folder named after the virtual machine, and locate the <virtual machine>.vmx file.
Right-click the .vmx file and click Add to inventory. The Add to Inventory wizard opens.
Continue to follow the wizard to add the virtual machine.

Remove the VM from inventory? Disable lockdown mode?!!? Edit the .VMX file???!! What…the…. I was going to break that all down into a numbered list and go thru it step by step but really, that would go on forever. Needless to say, I have a few problems with this (to say the least). Here’s my list:

The whole procedure is fraught with the potential for human error
It doesn’t scale to thousands, hundreds or even tens of VM’s
It opens more potential holes than it closes!
Not only that, it doesn’t take leverage of the capabilities of the vSphere infrastructure
It’s old school thinking for a new school infrastructure
Virtualization is supposed to make processes more agile and responsive, not add more burdens on the already over-taxed IT and Security teams!

There are better ways. I always start a trouble-shooting session with “What problem are we trying to solve?” So let’s break down what we want to accomplish.

We want to assess the current value for ALL virtual machines.
1. What use is setting if you can’t assess?
2. And if you have hundreds or even thousands of virtual machines, do you really want to check each one by hand?
If no value is set, then we either want to know that or we want to set the new value for all or a subset of virtual machines.
We want to do this in such a way as to not open potentially more security holes (!!!)
We want to limit human error
We MUST do it at scale!

All of these are good security practices. If you’re doing it the “old” way, you’re not getting the most bang for your buck!

In the vSphere Hardening Guide there are two guidelines. vm.disable-console-copy and vm.disable-console-paste. I’ll go through vm.disable-console-copy. The steps are the same for paste and are called out in the guide as well.

For my examples, I’ll use PowerCLI. I know, I know… I got email from someone (No names Edward) about a previous blog article asking why I couldn’t have supplied the Unix methods of doing this along with the Windows methods. Well, I’m most comfortable with PowerCLI and my bash scripting is very rusty. The Unix methods, using vCLI, are called out in the Hardening Guide but are limited for doing things at scale without additional coding. For the example below, there’s no assessment procedure in the Hardening Guide in the vCLI section. Certainly you CAN do this, but my coding pencil would need to be sharpened quite a bit for me to do it. If you have examples you’d like to share, based on the examples below, I’d be happy to write up a follow up blog post and give you full credit. 🙂

Security at Scale

I’ll assume you’re already connected to your vCenter or ESXi server using the Connect-VIserver command.

You can see via the screenshot of the steps I took.

Let’s break this down.

Get-VM

1

Get-VM
1. This got me a list of all virtual machines
Get-VM | New-AdvancedSetting -Name “isolation.tools.copy.disable“ -value $true
1. This created the setting in the VMX file for everyVM!
2. It set the value to True
3. Unlike the GUI, this setting can be applied while the virtual machine is powered on
4. If you need to re-apply this setting, use the “–force” qualifier
Get-VM | Get-AdvancedSetting -Name “isolation.tools.copy.enable“| Select Entity, Name, Value
1. This outputs a list of allthe VM’s that have the setting in their VMX file
2. VM’s without the setting won’t show in the list
3. Piping the output of this command to Export-CSV will give you a neat CSV file you can send to your Security Guy

What if I want to run a test to ensure that it will work, but not actually change anything? PowerShell’s “-whatif” qualifier addresses that!

Get-VM | New-AdvancedSetting -Name “isolation.tools.copy.disable“ -value $true –force -whatif

Key takeaway: What I’ve introduced to you is a powerful way to manage security using IT tools. That power also means that mistakes in your scripting can make changes you DON’T want to happen at scale as well. A single typo could be catastrophic. This is what the “-whatif” qualifier is all about. I would also recommend testing on single or groups of VM’s first before doing something to EVERY VM (or host or datastore or…)

We never logged into the ESXi shell!

Note that in these examples and even in the vCLI example in the Hardening Guide, we’re not logging into the ESXi shell.

In my humble opinion, logging into the ESXi shell should be a “Break Glass” scenario.

Use the Role Based Access Controls of vCenter to limit the scope of what administrators can do. Do you know what’s even better? Using VMware’s new Log Insight tool, I made the change and saw the log message come from vCenter that shows me that the virtual machine was reconfigured. This is something you can now alert on.

Wrap up

We are able to leverage Role Based Access Controls at vCenter to limit the scope of what people can do
We can make changes to VM’s without adding additional worries
We can make these changes at scale with consistent, reportable results
We can trap alerts of those changes, providing the checks and balances
- This is what I refer to as the Ronald Reagan School of IT and Security Management – Trust But Verify

The point here is not to fight the Unix/Windows battle. The point is that even though there are Unix like capabilities within ESXi, it’s the vSphere API, it’s libraries and toolkits that, when used for security, can provide better ways of doing things and reporting.

The DISA STIGS are clearly written to not leverage the capabilities available. To me, they feel like they were written by a Unix guy. That’s ok, if you’re managing Unix (I guess) but, as I said above, this isn’t Unix.

Thanks for reading,

mike

For more information on using PowerCLI to manage your vSphere environment, you should follow my mate Alan Renouf, @alanrenouf and read his blog

For more information on managing your vSphere environment using Unix tools, there’s no better place to go than my friend William @lamw Lam’s blog.