vSAN Hyperconverged Infrastructure vSAN

The Code Keepers: vSAN Native Encryption – Part 2

This blog is continuation from part-1.Part-1 provided an overview on vSAN Native Encryption. In this part, we will discuss use cases in context to managing encrypted vSAN cluster.

Setting up domain of trust

There are three parties involved in vSAN encryption – (1) Key Management Server or the KMS server (this is the entity that generates the keys)) (2) vCenter and (3) vSAN host or ESXi host

Set up domain of trust

Before we attempt to encrypt any data on vSAN, the first step is to set up a domain of trust among 3 parties (KMS, vCenter and vSAN host).

Setting up the domain of trust follows the standard Public Key Infrastructure (PKI) based management of digital certificates. The exact steps are dependent on the KMS provider.

Once the domain of trust is set up, KMS, vCenter and the vSAN host can begin communicating with each other. The exchange of key happens between the vSAN host and the KMS server.

The vSAN host provides a key reference or key id to the KMS server and the KMS server in response provides the key that is associated with the key id.

Turning on Encryption

Encryption is turned on by a single click

 

Once the domain of trust is set up, encryption is enabled by selecting “Encryption” in the checkbox or disabled by unselecting the encryption option.

Though the process of turning on or turning off encryption is essentially turning on (or off) the checkbox. This step should be managed with extreme care

 

 

Permissions

While we are at the topic of turing on encryption, it is important to take note that there are three separate sets of privileges. (1) Editing the cluster i.e. adding node, removing nodes, adding disks. (2)Managing keys – allow rekey operations on the cluster and (3) Managing Encryption – enabling or disabling encryption. This allows vSAN to control permissions at a very fine grain.

Disk format change (DFC)

Every time encryption is “turned on” or turned off” the vSAN cluster goes through a disk format change or DFC for short. DFC creates a new partition on  the disk. This partition holds meta-data informations (few KBs) that is used by vSAN to manage operations on the encrypted cluster. This step essentially prepares the disk to encrypt any write that is directed to it.

DFC preps disk to encrypt writes

A few things to keep in mind with respect to DFC.

(1) The DFC process is orchestrated as a rolling upgrade, one disk group at a time

(2) If data is present on the disk, data is moved out before DFC is initiated, this ensures data is preserved

There is an optional feature called “Erase disks before use”. Checking this box will soft erases the disk before writing new data. Note this step can be quite time consuming and should be used with care. It is not recommended to use this option unless the server or the disk is going through planned RMA. If the disk or server is leaving the premise, it is recommended to combine “disk erasure”with other disk wipe utilities.

The other check box “allow reduced redundancy” provides hint to vSAN that it is ok to have reduced number of replicas during the DFC process, this is specially helpful in small cluster with limited capacity headroom.

Different Encryption Keys in vSAN cluster

An encrypted vSAN cluster operates on 3 types of keys. (1) The primary key generated from KMS server. This called the KEK (Key encryption key) or Key wrapping key. (2) Disk encryption key (DEK), one per disk and (3) Host encryption key (HEK) for encrypting core dumps.

3 Types of encryption keys

The host encryption key is maintained separately to ensure Support Services can decrypt the core dump without having access to the encrypted data.

KMS is responsible for providing the “key encryption key” and the “host encryption key”. The “disk encryption key” is generated by vSAN.

Rekey Scenarios

Rekey is the process of updating the encryption keys. The frequency and the type of rekey will depend on your company guidelines.There are two types of rekey scenarios. Shallow rekey and Deep rekey. Shallow Rekey changes the KEK and is a fast process. Deep Rekey changes the disk encryption key, this is a slow process and requires “disk format change”. Every Deep Rekey operation, by default forces a shallow rekey. Hardware based encryption would require deep rekey with firmware changes.

Connectivity requirements between KMS and vSAN Host

vSAN encryption doesn’t require maintaining continuous connectivity with the KMS server. That been said, after the initial trust is set up, there are certain management operation that requires connectivity between KMS and vSAN host. Those are:

(1) Any Rekey (either shallow or deep) operation requires connectivity between KMS and vSAN host

(2) An encrypted vSAN host requires access to KMS (to get the KEK) in order to boot up

Some more use cases

(1) Turning on Encryption on existing vSAN cluster

(2) Host Reboot scenarios

(3) Swapping KMS servers

(4) Encrypted disk pulled out and placed back in the cluster

(5) Adding a new un-encrypted host to encrypted cluster

…and finally

(6) Turning off encryption

Turning on Encryption on an existing vSAN cluster

As we discussed earlier “turning” on encryption is enabled by checking the “encryption” option. Let’s explore what happens under the hood.

vSAN detects:

(1) There is existing data on the disk

(2) The disk has never been encrypted

(3) vSAN will initiate a “disk format change”

Note, this the process where in vSAN stamps the encryption meta data on the disk. The completion of this process – (1) signifies the disk will encrypt any write (2) is managed by vSAN as an encrypted disk.

(4) Data is moved out before DFC is initiated and written back in to the encrypted drive. Please ensure there is spare capacity in the cluster.

(5) The entire process should complete for all hosts (and disks) before the cluster is ready to encrypt data

Host reboots

This section describes the behavior when hosts in the cluster reboots. Encrypted hosts require access to the KEK that was used to encrypt Disk Encryption Keys in order to boot up. . Note at this time connectivity to KMS is required. Switching KMS is not advisable as the host would need access to the old KEK to boot up.

When the host boots up, it uses the KEK ID (KEK ID is the reference to get the KEK) to get the KEK (Key encryption key) from KMS. Note the KEK ID is persisted on the host.

Let us discuss the scenario where the KEK has changed. The host gets the old KEK from KMS, using the persisted KEK ID. Note, at this time the host is not aware that KEK has changed. vCenter will notice this and remediate to orchestrate the reconciliation process. It will force a shallow rekey operation on the host with the new KEK.

It is highly recommended not to change the KMS server when hosts in the cluster are rebooting. The hosts cannot boot up without access to the old KEK.

On the same principles, the process is identical when partitioned host rejoins the cluster.

Swapping KMS servers

Swapping KMS server is essentially a shallow rekey operation. The figure below describes the steps in detail.

Encrypted disk pulled out and placed back in

Let;s now consider a scenario where an encrypted disk is pulled out and placed back in. Let’s further assume the KEK has changed since the disk was pulled out and placed back

 When the disk is placed back in, vSAN disk check discovers:

(1) Disk belongs to the cluster

(2) Identifies the specific disk group to which the disk belongs

(3) The disk is already “stamped” with encryption information, hence DFC is not initiated

If KEK hasn’t changed, no further action required at this point. However if KEK has changed, similar to the host reboot scenario the disk will go through Shallow Rekey process.

At this point the disk is ready to accept encrypted data.

Adding a new unencrypted host to the encrypted cluster

When a new encrypted host is added to the encrypted cluster, a few things happen in the background.

vSAN discovers:

(1) New host is added with unencrypted disks

(2) Every disk shall go through disk format change (DFC) on a rolling basis, one disk group at a time

(3) As mentioned before if there is data on the disk, the data shall be moved out, disk shall go through DFC and data written back

(4) If it is fresh new host, data doesn’t need to be evacuated during the DFC process

Finally, turing off encryption.

Turning off Encryption

(1) Shallow Rekey is performed with all zero KEK (Key Encryption key)

(2) Data on the disk is unencrypted

(3) Disk goes though with disk format change and the drive is no longer stamped as “encrypted”

(4) The entire disk format change happens on a rolling upgrade basis, one disk group at a time