Data security can place a heavy burden on those responsible for security of their organization’s digital assets. Regulatory requirements and reports of data leakage in the news provide ample motivation to take the matter seriously. Administrators must secure this data, and one way this is achieved is through data encryption.
At some point, those responsible for these environments must look at how they can implement data encryption. For vSAN powered environments, enabling encryption services is relatively easy, but a common question often arises. “Will enabling vSAN encryption services have a performance impact on my vSAN cluster? And if so, by how much?“
It is a great question, and worth exploring in more detail.
A Primer on vSAN Encryption Offerings
vSAN offers two types of encryption services to make the data in a vSAN cluster more secure. For on-premises environments, both are optional cluster-level services that can be used independently or together which would offer a full end-to-end encryption solution. In VMC on AWS, encryption services are enabled by default.
vSAN Data-at-rest Encryption. Just as the name implies, this service encrypts the data as it is written to persistent media used for vSAN storage, and decrypts it as it is read from the datastore. It does this using cryptographic modules in the hypervisor, and does not require any self-encrypting drives. For more information, see the vSAN Data-at-Rest Encryption Tech Note.
vSAN Data-in-Transit Encryption. This encrypts vSAN traffic in-transit or “in-flight” across the hosts within a vSAN cluster. This helps secure data that is transmitted through host uplinks that may be used for multiple traffic types. For more information, see “Enable in-flight Encryption to a vSAN Cluster.”
Figure 1. Enabling vSAN Encryption Services at the cluster level.
Both encryption services use the same FIPS 140-2 validated cryptographic modules found in vSphere, used by vSphere’s VM Encryption. While outside of the scope of this article, VM Encryption is a feature of vSphere that allows for encrypting individual VMs and was primarily intended for traditional storage architectures. To find the answers to many commonly asked questions around vSAN encryption services, see the FAQs on vSAN Security.
Recommendation. Refrain from using the vSAN Encryption offerings in combination with vSphere’s VM Encryption. This would create unnecessary overhead for no real improvement in security and would diminish space efficiency outcomes when the Deduplication and Compression service is used.
Will enabling vSAN encryption services have a performance impact on my vSAN cluster?
This question, when stated in this way, is too vague to answer accurately. We must split the definition of “impact” into two specific parts. 1.) Additional overhead on the hosts, and 2.) Performance impact on the VMs.
Additional Overhead on the Hosts
The “overhead” relates to the additional physical resources used as a result of using Encryption. A change in resource utilization from adding encryption services may come primarily in the form of CPU and memory. How much? This will vary for a few very important reasons:
- Additional CPU resources used on each host. This depends on the encryption processing efficiency of your CPUs. The cryptographic modules used for vSAN encryption services take advantage of dedicated AES-NI instruction sets found on modern CPUs. New generations of CPUs are much more efficient at accelerating these encryption processes, and when compared to older generations of hosts, or hosts with AES-NI disabled, will reduce CPU utilization as viewed by the vSphere host CPU metrics.
- Additional memory resources used on each host. The amount of memory consumed is dynamic and will be based in part on the amount of I/O activity from the VMs. The vSAN Memory metric will allow you to view the memory usage of vSAN on a per-host basis throughout the cluster. This can be found by highlighting the cluster and clicking Monitor > vSAN > Support > Performance for Support > vSAN Memory. Observing this metric before and after encryption service settings are changed will be the most accurate way to understand the overhead for your environment.
Increased overhead typically results in a decrease in the number of potential VMs per host. An increase in overhead may have an impact on performance in cases where there is minimal headroom remaining to absorb that additional demand.
Performance Impact on the VMs
Additional encryption and decryption steps will occur for all types of vSAN storage traffic, such as resynchronizations, but the primary method of determining the performance impact that matters is through latency as seen by the guest VM shown in the vSAN performance metrics.
- vSAN Data-at-Rest Encryption. Increased guest VM latency may be observed as a result of writing to the storage devices. The encryption process occurs as the data is written to the write buffer, and since vSAN will not send the write acknowledgment back to the VM until the synchronous write is completed in the buffer, this may translate to higher latency as viewed by guest VM latency. The destaging process also must decrypt and re-encrypt the data as it is destaged to the capacity tier. While this process does not directly impact guest VM latency, it can slow down the destaging process, which under higher loads or hardware insufficient to handle that demand, could indirectly increase latency on the VM.
- vSAN Data-in-Transit Encryption. Increased guest VM latency may be observed as a result of transmitting synchronous writes from the guest VMs must be encrypted and decrypted in flight. Since vSAN writes data synchronously, anything that slows down all of the relevant devices and connectivity from completing the write will have an impact on latency. The principles of this are demonstrated in the post: Performance with vSAN Stretched Clusters.
The amount of impact by vSAN data-at-rest encryption, vSAN data-in-transit encryption, or both will be workload-dependent. Encryption activity only occurs when the VM is committing reads or writes to disk: Local or remote. Real workloads have unique levels of I/O behavior. Some VMs, such as in-memory databases may have high CPU activity, but proportionally low I/O activity relative to that amount of CPU usage. Other applications may have a modest amount of CPU activity, but proportionally high I/O activity relative to that amount of CPU usage. The only way to determine the impact of additional overhead on the hosts, and the performance impact on the VMs is to test it using your real workloads, on your existing hardware. It is for these reasons that suggesting a one-size-fits-all percentage of impact across all environments would not be accurate or useful.
Can one use a synthetic I/O generator or benchmarking tool to determine impact? It will work, but it will not be representative of what you will see with real workloads. Why? Synthetic benchmarking will typically allocate 100% of its efforts toward generating and transmitting as much I/O as possible. With real workloads, only a fraction of the CPU cycles consumed by the VM is related to I/O activity. This ratio of CPU cycles to I/O activity will depend on the workload and change by the second. Therefore, when evaluating the impact of encryption services in a vSAN environment, it is advised to stick with production workloads.
Figure 2. Comparing I/O and CPU activity for a synthetic I/O generator versus a normal workload.
We generally recommend enabling/disabling during the time a new cluster is deployed. This recommendation stems from the encryption services performing a rolling reformat of the storage devices used in the cluster. It is an automated process, but it does take time and resources. Enabling it or disabling it on a production cluster with live workloads is perfectly acceptable if you wish to use the feature, or evaluate encryption using the vSAN Proof of Concept Guide.
Recommendations
These recommendations may help in your efforts in implementing vSAN encryption services in your environment.
- Understand the tradeoffs. Much like other vSAN data services, enabling encryption may introduce tradeoffs. These performance or overhead tradeoffs can be offset through hardware decisions but is something that should be addressed in the design and sizing process.
- Confirm your business requirements. If your organization requires some level of encryption, let that be reflected in your hardware specifications and design, so that performance SLAs are not compromised in the effort to meet security SLAs.
- Enable one of your vSAN clusters to better understand operational and performance considerations. This will allow for easy evaluation without impacting all of your workloads. You can even carve out a small temporary cluster for an easy, isolated evaluation.
- Take advantage of HCI Mesh when using Data-at-Rest Encryption. HCI Mesh will allow you to encrypt a single VM’s data if it is using storage capacity on a remote vSAN cluster that has Data-at-Rest Encryption enabled.
- Run the latest version of vSAN. Performance and overhead improve with each version of vSAN. Running the latest edition is a simple way to improve performance.
- Do not use vSAN encryption services and vSphere’s VM Encryption together. Choose one or the other, but not both. See “Using Data-at-Rest Encryption and VM Encryption Together” in the vSAN Operations Guide for more information.
- Use the latest generations of servers. Modern chipsets are much more efficient with their AES offloading than older generations.
- Use fast NVMe for storage devices. NVMe devices (NAND or Optane based) are not only higher performing, but consume just 1/3rd the CPU cycles per I/O that SATA devices consume. This will help offset some of the computational overhead that may occur with encryption services and will deliver performance more consistently than other bus protocols. Faster-performing devices at the buffer tier will lower latencies seen by the guest VMs.
- Use a “Trusted Platform Module” (TPM) on all vSphere hosts. TPM chips are an affordable device that will allow security-conscious customers to address their security concerns much more easily. At less than $50 USD per module, the cost of TPMs is minimal, yet their functionality is significant.
You can find more tips and answers to commonly asked questions on vSAN encryption services on the vSAN FAQs under “Security.”
Summary
VMware makes it easy to run a vSAN cluster that provides end-to-end encryption. Understanding what to look for, and how to measure this behavior before and after encryption is enabled will give you the answers to how much of an impact enabling vSAN encryption services will have in your environment.