In a previous blog post I went over the details on how ESXi uses a TPM 2.0 chip to provide assurance that Secure Boot did its job and how that “attestation” rolls up to vCenter to be reported on.
In this blog article I’m going to go over some of steps necessary to configure the ESXi host to use TPM 2.0 chip. Now, I have only a limited number of hardware systems in my lab from which to do this, but the steps should be familiar, regardless of the server model.
Stop! Important Note!
Please see my other blog on “Prepping an ESXi 6.7 host for Secure Boot“. TPM 2.0’s function on an ESXi host to attest that Secure Boot has done its job. If you cannot successfully boot with Secure Boot FIRST then don’t don’t bother trying to configure the host for TPM 2.0. You need Secure Boot working FIRST. First rule of good troubleshooting, limit the number of changes!
As called out in the documentation, there are a few prerequisites you need to meet before starting this process.
To use a TPM 2.0 chip, your vCenter Server environment must meet these requirements:
vCenter Server 6.7
ESXi 6.7 host with TPM 2.0 chip installed and correctly configured in the UEFI bios
UEFI Secure Boot enabled
Server BIOS settings
Correctly configuring the TPM 2.0 devices in the BIOS involves ensuring a number of settings are correct.
- The TPM is set to use SHA-256 hashing
- If available, it must also be set to use the IS/FIFO (First-In, First-Out) interface and not CRB (Command Response Buffer)
- TXT must be disabled
- Yes, we use TXT when using TPM 1.2 but it is not yet implemented in TPM 2.0 on ESXi (and yes, I ran in to this specifically!)
The servers I have in my lab are Dell PowerEdge R630’s. They originally came with TPM 1.2 devices but I had them upgraded to TPM 2.0 and they are running BIOS version 2.6.0.
Here are the settings in the System Security part of my servers BIOS. Your systems may look different but the options should be similar.
When I first started this process I did what most do. I didn’t read the docs. I like to break things and see if I can fix them. And then ask questions of the engineers. Why do I do this? Well, for one, I believe I learn faster by breaking and fixing and besides, it’s a lot more fun for me. Also, I’m trying to replicate what customers may encounter. Oh, sure, 99% of you actually read the docs before jumping on to Twitter to ask a question, right? RIGHT? Well, I’m there for that 1% who don’t!
When I started, I got the TPM 2.0 devices installed and I then installed 6.7 (after updating my VCSA first of course!). What resulted next was an error on the summary page of the ESXi host.
Note: I do not have 117 ESXi hosts at my disposal. Yes, I have been asked that.
I went in to the BIOS and started playing around with settings. I cleared the “TPM Hierarchy” (the contents of the TPM) but that didn’t do it. I was getting an alarm that things weren’t configured correctly.
One of our engineers, Sam, was awesome. I have to give her credit for maintaining her patience with me. She had me look at the logs and sure enough, we found something interesting:
2018-05-09T21:30:21.060Z cpu23:2099722)WARNING: tpmDriver: TpmDriverInitImpl:532: TPM 2 SHA-256 PCR bank not found to be active.
2018-05-09T21:30:21.060Z cpu23:2099722)tpmdriver failed to load.
2018-05-09T21:30:21.061Z cpu23:2099722)WARNING: Elf: 3144: Kernel based module load of tpmdriver failed: Failure <Mod_LoadDone failed>
Oh look! TPM wasn’t set to use SHA256 hashing! So I set the TPM to use SHA256 hashing.
This setting was in the TPM Advanced settings page that I was able to select the hashing algorithm. See below:
Note that when I took this screenshot I had TXT enabled. This caused another set of errors in the log files. Here’s the text from that.
[root@esxi-117:/var/log] grep tpm vmkernel.log
2018-05-10T14:02:27.659Z cpu29:2097807)Activating Jumpstart plugin tpm.
2018-05-10T14:02:27.709Z cpu46:2099728)Loading module tpmdriver ...
2018-05-10T14:02:27.711Z cpu46:2099728)Elf: 2101: module tpmdriver has license VMware
2018-05-10T14:02:27.716Z cpu46:2099728)tpmDriver: TpmDriverFindIoMemory:332: Found TPM at base: 0xfed40000
2018-05-10T14:02:27.716Z cpu46:2099728)tpmDriver: Tpm2Init:1582: Activated locality 0
2018-05-10T14:02:27.716Z cpu46:2099728)tpmDriver: Tpm2CheckInterface:603: TPM is in FIFO mode.
2018-05-10T14:02:27.726Z cpu46:2099728)tpmDriver: Tpm2Init:1596: Initialization of TPM 2 impl done.
2018-05-10T14:02:27.736Z cpu46:2099728)tpmDriver: Tpm2LogVendor:1551: Vendor ID: NTC
2018-05-10T14:02:27.777Z cpu46:2099728)tpmDriver: Tpm2ResMgr_Init:1415: TPM 2.0 Resource manager initialized.
2018-05-10T14:02:27.817Z cpu46:2099728)Mod: 4962: Initialization of tpmdriver succeeded with module ID 102.
2018-05-10T14:02:27.817Z cpu46:2099728)tpmdriver loaded successfully.
2018-05-10T14:02:27.820Z cpu29:2097807)Jumpstart plugin tpm activated.
2018-05-10T14:02:55.920Z cpu18:2100795)tpmDriver: Tpm2ResMgrProcessResponse:846: Error: TPM command error code 0x18b
While going through this process I was sharing my experiences on the vExpert Slack channel and others had come across the “Tpm2ResMgrProcessResponse:846: Error: TPM command error code 0x18b” error as well.
It was at this time that I was told by Engineering to disable TXT. TXT has not been implemented it in our current TPM 2.0 code.
Time to file a bug report
Reboot number, oh, I don’t know, 3? 4? I still encountered a failure. So I filed a bug. This time the host was reporting a “Failed” attestation and there was nothing in the kernel log stating why. Another one of our engineers looked at the bug and the vCenter and ESXi support bundles and found the latest culprit.
vpxd-11.log:2018-05-10T17:41:05.588Z info vpxd [Originator@6876 sub=Attestation opID=HB-host-29@74-be7ec9-SWI-4843ebf6] Starting host attestation; [vim.HostSystem:host-29,esxi-117.foobar.com]
vpxd-11.log:2018-05-10T17:41:05.588Z info vpxd [Originator@6876 sub=Attestation opID=HB-host-29@74-be7ec9-SWI-4843ebf6] No cached identity key, loading from DB
vpxd-11.log:2018-05-10T17:41:05.591Z warning vpxd [Originator@6876 sub=Attestation opID=HB-host-29@74-be7ec9-SWI-4843ebf6] Failed to update integrity report; [vim.HostSystem:host-29,esxi-117.foobar.com], N7Vmacore9ExceptionE(No identity key in DB, try to reconnect host)
A-Ha! “No identity key in DB, try to reconnect host” explains it! What this means is that the host was added to vCenter without a TPM 2.0 chip enabled in the bios. After it was added was when the TPM 2.0 chip was enabled in the BIOS. In my case, my hosts were added a couple of years ago, I installed a TPM 2.0 device after the fact. What this error means is that there is no TPM Endorsement Key stored in the VCDB. This trust is set up when vCenter first adds the host to a cluster.
The solution was simple. Disconnect and reconnect the host. Put the host into Maintenance Mode, right click and select Connection…Disconnect and then right click again and select Connection…Connect. No need to remove the host from inventory.
Unfortunately, when I looked in the documentation (after the fact, naturally) to see if the error and solution was documented the response was “Call support”. We quickly got that fixed and now the documentation says the following:
Note: If you add a TPM 2.0 chip to an ESXi host that is already managed by a vCenter Server, you must first disconnect the host, then reconnect it. See vCenter Server and Host Management documentation for information about disconnecting and reconnecting hosts.
In fact, we even added a section on troubleshooting based directly on my experiences that led to this blog!
At this point the host showed up as having passed attestation! Woo-Hoo! Secure Boot has done its job and I can provide a report that says so, based on TPM 2.0 trust.
I hope this has been helpful for you in setting up your ESXi host to use TPM 2.0. I think out of this whole process of NOT looking at the documentation and fumbling my way through the setup and configuration helped us end up with much better documentation and a better understanding of where things could go wrong. That’s #winning in my book.
I want to thank all the engineers that helped out on this. It really helped me understand what’s going on under the covers and enabled me to write these blogs.
@vspheresecurity is a curated list of vSphere Security specific tweets.
Thanks for reading!