This blog is continuation from part-1.Part-1 provided an overview on vSAN Native Encryption. In this part, we will discuss use cases in context to managing encrypted vSAN cluster.
Setting up domain of trust
There are three parties involved in vSAN encryption - (1) Key Management Server or the KMS server (this is the entity that generates the keys)) (2) vCenter and (3) vSAN host or ESXi host
Before we attempt to encrypt any data on vSAN, the first step is to set up a domain of trust among 3 parties (KMS, vCenter and vSAN host).
Setting up the domain of trust follows the standard Public Key Infrastructure (PKI) based management of digital certificates. The exact steps are dependent on the KMS provider.
Once the domain of trust is set up, KMS, vCenter and the vSAN host can begin communicating with each other. The exchange of key happens between the vSAN host and the KMS server.
The vSAN host provides a key reference or key id to the KMS server and the KMS server in response provides the key that is associated with the key id.
Turning on Encryption
Once the domain of trust is set up, encryption is enabled by selecting "Encryption" in the checkbox or disabled by unselecting the encryption option.
Though the process of turning on or turning off encryption is essentially turning on (or off) the checkbox. This step should be managed with extreme care
While we are at the topic of turing on encryption, it is important to take note that there are three separate sets of privileges. (1) Editing the cluster i.e. adding node, removing nodes, adding disks. (2)Managing keys - allow rekey operations on the cluster and (3) Managing Encryption - enabling or disabling encryption. This allows vSAN to control permissions at a very fine grain.
Disk format change (DFC)
Every time encryption is "turned on" or turned off" the vSAN cluster goes through a disk format change or DFC for short. DFC creates a new partition on the disk. This partition holds meta-data informations (few KBs) that is used by vSAN to manage operations on the encrypted cluster. This step essentially prepares the disk to encrypt any write that is directed to it.
A few things to keep in mind with respect to DFC.
(1) The DFC process is orchestrated as a rolling upgrade, one disk group at a time
(2) If data is present on the disk, data is moved out before DFC is initiated, this ensures data is preserved
There is an optional feature called “Erase disks before use”. Checking this box will soft erases the disk before writing new data. Note this step can be quite time consuming and should be used with care. It is not recommended to use this option unless the server or the disk is going through planned RMA. If the disk or server is leaving the premise, it is recommended to combine "disk erasure"with other disk wipe utilities.
The other check box “allow reduced redundancy” provides hint to vSAN that it is ok to have reduced number of replicas during the DFC process, this is specially helpful in small cluster with limited capacity headroom.
Different Encryption Keys in vSAN cluster
An encrypted vSAN cluster operates on 3 types of keys. (1) The primary key generated from KMS server. This called the KEK (Key encryption key) or Key wrapping key. (2) Disk encryption key (DEK), one per disk and (3) Host encryption key (HEK) for encrypting core dumps.
The host encryption key is maintained separately to ensure Support Services can decrypt the core dump without having access to the encrypted data.
KMS is responsible for providing the "key encryption key" and the "host encryption key". The "disk encryption key" is generated by vSAN.
Rekey is the process of updating the encryption keys. The frequency and the type of rekey will depend on your company guidelines.There are two types of rekey scenarios. Shallow rekey and Deep rekey. Shallow Rekey changes the KEK and is a fast process. Deep Rekey changes the disk encryption key, this is a slow process and requires “disk format change”. Every Deep Rekey operation, by default forces a shallow rekey. Hardware based encryption would require deep rekey with firmware changes.
Connectivity requirements between KMS and vSAN Host
vSAN encryption doesn’t require maintaining continuous connectivity with the KMS server. That been said, after the initial trust is set up, there are certain management operation that requires connectivity between KMS and vSAN host. Those are:
(1) Any Rekey (either shallow or deep) operation requires connectivity between KMS and vSAN host
(2) An encrypted vSAN host requires access to KMS (to get the KEK) in order to boot up
Some more use cases
(1) Turning on Encryption on existing vSAN cluster
(2) Host Reboot scenarios
(3) Swapping KMS servers
(4) Encrypted disk pulled out and placed back in the cluster
(5) Adding a new un-encrypted host to encrypted cluster
(6) Turning off encryption
Turning on Encryption on an existing vSAN cluster
As we discussed earlier "turning" on encryption is enabled by checking the "encryption" option. Let's explore what happens under the hood.
(1) There is existing data on the disk
(2) The disk has never been encrypted
(3) vSAN will initiate a "disk format change"
Note, this the process where in vSAN stamps the encryption meta data on the disk. The completion of this process - (1) signifies the disk will encrypt any write (2) is managed by vSAN as an encrypted disk.
(4) Data is moved out before DFC is initiated and written back in to the encrypted drive. Please ensure there is spare capacity in the cluster.
(5) The entire process should complete for all hosts (and disks) before the cluster is ready to encrypt data
This section describes the behavior when hosts in the cluster reboots. Encrypted hosts require access to the KEK that was used to encrypt Disk Encryption Keys in order to boot up. . Note at this time connectivity to KMS is required. Switching KMS is not advisable as the host would need access to the old KEK to boot up.
When the host boots up, it uses the KEK ID (KEK ID is the reference to get the KEK) to get the KEK (Key encryption key) from KMS. Note the KEK ID is persisted on the host.
Let us discuss the scenario where the KEK has changed. The host gets the old KEK from KMS, using the persisted KEK ID. Note, at this time the host is not aware that KEK has changed. vCenter will notice this and remediate to orchestrate the reconciliation process. It will force a shallow rekey operation on the host with the new KEK.
It is highly recommended not to change the KMS server when hosts in the cluster are rebooting. The hosts cannot boot up without access to the old KEK.
On the same principles, the process is identical when partitioned host rejoins the cluster.
Swapping KMS servers
Swapping KMS server is essentially a shallow rekey operation. The figure below describes the steps in detail.
Encrypted disk pulled out and placed back in
Let;s now consider a scenario where an encrypted disk is pulled out and placed back in. Let's further assume the KEK has changed since the disk was pulled out and placed back
When the disk is placed back in, vSAN disk check discovers:
(1) Disk belongs to the cluster
(2) Identifies the specific disk group to which the disk belongs
(3) The disk is already "stamped" with encryption information, hence DFC is not initiated
If KEK hasn't changed, no further action required at this point. However if KEK has changed, similar to the host reboot scenario the disk will go through Shallow Rekey process.
At this point the disk is ready to accept encrypted data.
Adding a new unencrypted host to the encrypted cluster
When a new encrypted host is added to the encrypted cluster, a few things happen in the background.
(1) New host is added with unencrypted disks
(2) Every disk shall go through disk format change (DFC) on a rolling basis, one disk group at a time
(3) As mentioned before if there is data on the disk, the data shall be moved out, disk shall go through DFC and data written back
(4) If it is fresh new host, data doesn't need to be evacuated during the DFC process
Finally, turing off encryption.
Turning off Encryption
(1) Shallow Rekey is performed with all zero KEK (Key Encryption key)
(2) Data on the disk is unencrypted
(3) Disk goes though with disk format change and the drive is no longer stamped as "encrypted"
(4) The entire disk format change happens on a rolling upgrade basis, one disk group at a time