vSGA, or Virtual Shared Graphics Acceleration, allows multiple VMware vSphere virtual machines to share hardware GPUs. We have advocated in previous blog articles the use of NVIDIA GRID vGPU technology, and this is a good solution for many use cases. In this blog, we look at the performance of vGPU technology vs. vSGA while limiting our testing to a workload generated by VMware Horizon 7 VDI desktops. Based on our measurements (we present some of that data in this blog) vSGA provides performance very close to vGPU when using a variety of software applications, including Microsoft Office, Adobe Acrobat, CAD viewers, YouTube video, and viewing or working with WebGL-based images.
Leaving CPU Hot Add at its default setting of disabled is one of the performance best practices that we have for large VMs. From the Performance Best Practices Guide for vSphere 6.7 U2:
CPU Hot Add is a feature that allows the addition of vCPUs to a running virtual machine. Enabling this feature, however, disables vNUMA for that virtual machine, resulting in the guest OS seeing a single vNUMA node. Without vNUMA support, the guest OS has no knowledge of the CPU and memory virtual topology of the ESXi host. This in turn could result in the guest OS making sub-optimal scheduling decisions, leading to reduced performance for applications running in large virtual machines. For this reason, enable CPU Hot Add only if you expect to use it. Alternatively, plan to power down the virtual machine before adding vCPUs, or configure the virtual machine with the maximum number of vCPUs that might be needed by the workload. If choosing the latter option, note that unused vCPUs incur a small amount of unnecessary overhead. Unused vCPUs could also cause the guest OS to make poor scheduling decisions within the virtual machine, again with the potential for reduced performance. For additional information see VMware KB article 2040375.
The reason for this is that if you enable CPU Hot Add, virtual NUMA is disabled. This means that the VM is not aware of which of its vCPUs are on the same NUMA node and might increase remote memory access. This removes the ability for the guest OS and applications to optimize based on NUMA and results in a possible reduction in performance.
Virtual NUMA (vNUMA) exposes NUMA topology to the guest operating system, allowing NUMA-aware guest operating systems and applications to make the most efficient use of the underlying hardware’s NUMA architecture. (For more information about NUMA, see page 27 in the Performance Best Practices Guide for vSphere 6.7 U2.)
To get an idea of what the performance impact can be by enabling CPU Hot Add, a simple test was run in our lab environment. This test found performance with the default setting of CPU Hot Add disabled performed from 2% to 8% better than when CPU Hot Add was enabled.
By Dave Jaffe, VMware Performance Engineering
A new white paper is available showing the advantages of running virtualized Spark Deep Learning workloads on Kubernetes.
Recent versions of Spark include support for Kubernetes. For Spark on Kubernetes, the Kubernetes scheduler provides the cluster manager capability provided by Yet Another Resource Negotiator (YARN) in typical Spark on Hadoop clusters. Upon receiving a spark-submit command to start an application, Kubernetes instantiates the requested number of Spark executor pods, each with one or more Spark executors.
The benefits of running Spark on Kubernetes are many: ease of deployment, resource sharing, simplifying the coordination between developer and cluster administrator, and enhanced security. A standalone Spark cluster on vSphere virtual machines running in the same configuration as a Kubernetes-managed Spark cluster on vSphere virtual machines were compared for performance using a heavy workload, and the difference imposed by Kubernetes was found to be insignificant.
Spark applications running in Standalone mode require that every Spark worker node be installed with the correct version of Spark, Python, Java, etc. This puts a burden on the IT administrator, who may be managing many Spark applications with different requirements, and it requires coordination between the administrator and the application developer. With Kubernetes, the developer only needs to create a container with the correct software, and the IT administrator just needs to manage the cluster using the fine-grained resource management tools to enable the different Spark workloads.
To compare Spark Standalone performance to Spark on Kubernetes performance, a Deep Learning workload, the Maximum Throughput Spark BigDL ResNet50 image classifier from VMware IoT Analytics Benchmark, was run on the same 16 worker nodes, first while configured as Spark worker nodes, then while configured as Kubernetes nodes. Then the number of nodes was reduced by four (by removing the four workers on host 4), and the same comparison was made using 12 nodes, then 8, then 4.
The relative results are shown below. The Spark Standalone and Spark on Kubernetes performance in terms of images per second classified was within ~1% of each other for all configurations. Performance scaled well for the Spark tests as the number of VMs increased from 4 (1 server) to 16 (4 servers).
All details are in the paper.
The VMware Performance team has published an updated paper detailing vCenter Server 6.7 performance in a remote offices and branch offices (ROBO) environment.
Many organizations today have a ROBO environment with local IT infrastructure. These remote locations usually have anywhere from a few servers running a few workloads to support local needs, to numerous servers spanning a large-scale datacenter. The distributed and remote nature of this infrastructure makes it hard to manage, difficult to protect, and costly to maintain. Further, the remote nature of servers makes it more challenging to perform important VM/host-related operations.
vSphere is designed to address these ROBO use cases, including IT infrastructure located in remote, distributed sites. VMware vCenter Server provides a centralized way to control and monitor the virtual infrastructure, including ESXi hosts, virtual machines, storage, and networking resources. It has been widely deployed in a ROBO environment to manage ESXi hosts that are distributed over large geographical distances over a wide range of networks with different network characteristics, including low/high bandwidth, network latency, and packet error rates. In the paper, we test:
- LAN with high-bandwidth and low-latency links.
- WAN with low-bandwidth and high-latency links.
- Various networks in between; for example, DSL, T1, 4G, 5G, …
We demonstrate that vCenter Server performs well in the ROBO environment for both network bandwidth use, as well as virtual machine and ESXi host task execution times. Instead of a bandwidth restriction, we observe that network latency has a bigger impact on the overall performance. As the network latency between vCenter Server and ESXi hosts increases, the average operation latency also increases. The experimental results also show how efficiently vCenter Server executes VM operations in high-latency networks: The average VM operation execution time increases much more slowly when network latency increases by several times.
vSphere 5.1 introduced an inventory tagging feature that has been available in all later versions of vSphere, including vSphere 6.7. Tags let datacenter administrators organize different vSphere objects like datastores, virtual machines, hosts, and so on. This makes it easier to sort and search for objects that share a tag, among other things. For example, you might use tags to track a group of VMs that all have the same operating system.
Writing code to use tags can be challenging in large-scale environments: a straightforward use of VMware PowerCLI cmdlets may result in poor performance, and while direct Tagging Service APIs are faster, the documentation can be difficult to understand. In this blog, we show some practical examples of using PowerCLI and Tagging Service APIs to perform tag-related operations. We include some simple measurements to show the performance improvements when using the Tagging Service vs. cmdlets. The sample performance numbers are for illustrative purposes only. We describe the test setup in the Appendix.
Update: There is also a white paper on this topic, which references this blog. See VMware vSphere 6.7 Tagging Best Practices.
1. Connecting to PowerCLI and the Tagging Service
In this document, when we write “PowerCLI cmdlets,” we mean calls like Get-Tag, or Get-TagCategory. To access this API, simply open a PowerShell terminal and log in:
Connect-VIServer <vCenter server IP or FQDN> -User <username> -Pass <password>
Along with the recent release of VMware vSphere 6.7 U2, we published a new whitepaper that shows the performance of a new scheduler option that was included in the 6.7 U2 update. We referred to this new scheduler option internally as the “sibling” scheduler, but the official name is the side-channel aware scheduler version 2, or SCAv2. The whitepaper includes full details about SCAv1 and SCAv2, the L1TF security vulnerability that made them necessary, and the performance implications with several different workload types. This blog is a brief overview of the key points, but we recommend that you check out the full document.
In August of 2018, a security vulnerability known as L1TF, affecting systems using Intel processors, was revealed, and patches and remediations were also made available. Intel provided micro-code updates for its processors, operating system patches were made available, and VMware provided an update for vSphere. The full details of the vCenter and ESXi patches are in a VMware security advisory that links to individual KB articles.
VMware Cloud on AWS is a hybrid cloud service that runs the VMware software-defined data center (SDDC) stack in the Amazon Web Services (AWS) public cloud. The service automatically provisions and deploys a vSphere environment on a bare-metal AWS infrastructure, and lets you run your applications in a hybrid IT environment across your on-premises data centers and AWS global infrastructure. A key benefit of VMware Cloud on AWS is the ability to vMotion workloads back and forth from your on-premises data center to the AWS public cloud as capacity and data privacy require.
In this blog post, we share the results of our vMotion performance tests across our hybrid cloud environment that consisted of a vSphere on-premises data center located in Wenatchee, Washington and an SDDC hosted in an AWS cloud, in various scenarios including hybrid migration of a database server. We also describe the best practices to follow when migrating virtual machines by vMotion across hybrid cloud.
By Amitabha Banerjee and Abhishek Srivastava
The vSAN Performance Diagnostics feature, which helps customers to optimize their benchmarks or their vSAN configurations to achieve the best possible performance, was first introduced in vSphere 6.5 U1. vSAN Performance Diagnostics is a “cloud connected” feature and requires participation in the VMware Customer Experience Improvement Program (CEIP). Performance metrics and data are collected from the vSAN cluster and are sent to the VMware Cloud. The data is analyzed and the results are sent back for display in the vCenter Client. These results are shown as performance issues, where each issue includes a problem with its description and a link to a KB article.
In this blog, we describe how vSAN Performance Diagnostics can be used with HCIBench and show the new feature in vSphere 6.7 U1 that provides HCIBench specific issues and recommendations.
What is HCIBench?
HCIBench (Hyper-converged Infrastructure Benchmark) is a standard benchmark that vSAN customers can use to evaluate the performance of their vSAN systems. HCIBench is an automation wrapper around the popular and proven VDbench open source benchmark tool that makes it easier to automate testing across an HCI cluster. HCIBench, available as a fling, simplifies and accelerates customer performance testing in a consistent and controlled way.
Virtual machine (VM) provisioning operations such as create, clone, and relocate involve the placement of storage resources. Storage DRS (sometimes seen as “SDRS”) is the resource management component in vSphere responsible for optimal storage placement and load balancing recommendations in the datastore cluster.
A key contributor to VM provisioning times in Storage DRS-enabled environments is the time it takes (latency) to receive placement recommendations for the VM disks (VMDKs). This latency particularly comes into play when multiple VM provisioning requests are issued concurrently.
Several changes were made in vSphere 6.7 to improve the time to generate placement recommendations for provisioning operations. Specifically, the level of parallelism was improved for the case where there are no storage reservations for VMDKs. This resulted in significant improvements in recommendation times when there are concurrent provisioning requests.
vRealize automation suite users who use blueprints to deploy large numbers of VMs quickly will notice the improvement in provisioning times for the case when no reservations are used.
Several performance optimizations were further made inside key steps of processing the Storage DRS recommendations. This improved the time to generate recommendations, even for standalone provisioning requests with or without reservations.
vSphere 6.7 U1 includes several enhancements in Storage Policy-Based Management (SPBM) to significantly reduce CPU use and generate a much faster response time for compliance checking operations.
SPBM is a framework that allows vSphere users to translate their workload’s storage requirements into rules called storage policies. Users can apply storage policies to virtual machines (VMs) and virtual machine disks (VMDKs) using the vSphere Client or through the VMware Storage Policy API’s rich set of managed objects and methods. One such managed object is PbmComplianceManager. One of its methods, PbmCheckCompliance, helps users determine whether or not the storage policy attached to their VM is being honored.
PbmCheckCompliance is automatically invoked soon after provisioning operations such as creating, cloning, and relocating a VM. It is also automatically triggered in the background once every 8 hours to help keep the compliance records up-to-date.