Along with the recent release of VMware vSphere 6.7 U2, we published a new whitepaper that shows the performance of a new scheduler option that was included in the 6.7 U2 update. We referred to this new scheduler option internally as the “sibling” scheduler, but the official name is the side-channel aware scheduler version 2, or SCAv2. The whitepaper includes full details about SCAv1 and SCAv2, the L1TF security vulnerability that made them necessary, and the performance implications with several different workload types. This blog is a brief overview of the key points, but we recommend that you check out the full document.
In August of 2018, a security vulnerability known as L1TF, affecting systems using Intel processors, was revealed, and patches and remediations were also made available. Intel provided micro-code updates for its processors, operating system patches were made available, and VMware provided an update for vSphere. The full details of the vCenter and ESXi patches are in a VMware security advisory that links to individual KB articles.
The ESXi-provided patches included a side-channel aware scheduler (SCAv1) that mitigated the concurrent-context attack vector for L1TF. Once that mode was enabled, the scheduler would only schedule processes on one thread for each core. This mode impacted performance mostly from a capacity standpoint because the system was no longer able to use both hyper-threads on a core. A server that was already fully utilized and running at maximum capacity would see a decrease in capacity of up to approximately 30%. A server that was running at 75% of capacity would see a much smaller impact to performance, but CPU utilization would rise.
In vSphere 6.7 U2, the side-channel aware scheduler has been enhanced (SCAv2) with a new policy to allow hyper-threads to be used concurrently if both threads are running vCPU contexts from the same VM. In this way, L1TF side channels are constrained to not expose information across VM/VM or VM/hypervisor boundaries.
Performance testing with several different workloads found a range of impact in performance for both SCAv1 and SCAv2 as compared to the default scheduler as the baseline of performance. If SCAv1 or SCAv2 were able to achieve the same performance, it would be 1.0, and if it achieved 75% of the performance, it would be .75. The graphs here show the performance impact at max server utilization and the impact at the reduced load of approximately 75% utilization.
The charts show that the SCAv2 scheduler, represented by the third bar in each group, recovers a significant percentage of performance in all cases, except for the monster VM test case. The monster VM test case was for a single large Oracle database VM that consumed an entire 4 socket host with 192 vCPUs. In configurations with a single large monster VM that uses all the logical threads of the host, SCAv1 had a slight performance advantage over SCAv2 in our testing.
The reduced load numbers show that at server usage levels of approximately 75%, the overall impact to performance is much lower. With SCAv2 and the overall load below 75%, tests show that the largest performance impact measured in these tests was 11%. The SCAv2 scheduler option, available in vSphere 6.7 U2, provides better performance than SCAv1 in almost all cases.
For full details about the individual benchmark tests as well as more details about L1TF and VMware’s response to it, please see the full whitepaper and VMware KB 55806.