Home > Blogs > VMware VROOM! Blog > Tag Archives: Performance

Tag Archives: Performance

Introducing VMmark ML

VMmark has been the go-to virtualization benchmark for over 12 years. It’s been used by partners, customers, and internally in a wide variety of technical applications. VMmark1, released in 2007, was the de-facto virtualization consolidation benchmark in a time when the overhead and feasibility of virtualization was still largely in question. In 2010, as server consolidation became less of an “if” and more of a “when,” VMmark2 introduced more of the rich vSphere feature set by incorporating infrastructure workloads (VMotion, Storage VMotion, and Clone & Deploy) alongside complex application workloads like DVD Store. Fast forward to 2017, and we released VMmark3, which builds on the previous versions by integrating an easy automation deployment service alongside complex multi-tier modern application workloads like Weathervane. To date, across all generations, we’ve had nearly 300 VMmark result publications (297 at the time of this writing) and countless internal performance studies.

Unsurprisingly, tech industry environments have continued to evolve, and so must the benchmarks we use to measure them. It’s in this vein that the VMware VMmark performance team has begun experimenting with other use cases that don’t quite fit the “traditional” VMmark benchmark. One example of a non-traditional use is Machine Learning and its execution within Kubernetes clusters. At the time of this writing, nearly 9% of the VMworld 2019 US sessions are about ML and Kubernetes. As such, we thought this might be a good time to provide an early teaser to VMmark ML and even point you at a couple of other performance-centric Machine Learning opportunities at VMworld 2019 US.

Although it’s very early in the VMmark ML development cycle, we understand that there’s a need for push-button-easy, vSphere-based Machine Learning performance analysis. As an added bonus, our prototype runs within Kubernetes, which we believe to be well-suited for this type of performance analysis.

Our internal-only VMmark ML prototype is currently streamlined to efficiently perform a limited number of operations very well as we work with partners, customers, and internal teams on how VMmark ML should be exercised. It is able to:

  1. Rapidly deploy Kubernetes within a vSphere environment.
  2. Deploy a variety of containerized ML workloads within our newly created VMmark ML Kubernetes cluster.
  3. Execute these ML workloads either in isolation or concurrently to determine the performance impact of architectural, hardware, and software design decisions.

VMmark ML development is still very fluid right now, but we decided to test some of these concepts/assumptions in a “real-world” situation. I’m fortunate to work alongside long-time DVD Store author and Big Data guru Dave Jaffe on VMmark ML.  As he and Sr. Technical Marketing Architect Justin Murray were preparing for their VMworld US talk, “High-Performance Virtualized Spark Clusters on Kubernetes for Deep Learning [BCA1563BU]“, we thought this would be a good opportunity to experiment with VMmark ML. Dave was able to use the VMmark ML prototype to deploy a 4-node Kubernetes cluster onto a single vSphere host with a 2nd-Generation Intel® Xeon® Scalable processor (“Cascade Lake”) CPU. VMmark ML then pulled a previously stored Docker container with several MLperf workloads contained within it. Finally, as a concurrent execution exercise, these workloads were run simultaneously, pushing the CPU utilization of the server above 80%. Additionally, Dave is speaking about vSphere Deep Learning performance in his talk “Optimize Virtualized Deep Learning Performance with New Intel Architectures [MLA1594BU],“ where he and Intel Principal Engineer Padma Apparao explore the benefits of Vector Neural Network Instructions (VNNI). I definitely recommend either of these talks if you want a deep dive into the details of VNNI or Spark analysis.

Another great opportunity to learn about VMware Performance team efforts within the Machine Learning space is to attend the Hands-on-Lab Expert Lead Workshop, “Launch Your Machine Learning Workloads in Minutes on VMware vSphere [ELW-2048-01-EMT_U],” or take the accompanying lab. This is being led by another VMmark ML team member Uday Kurkure along with Staff Global Solutions Consultant Kenyon Hensler. (Sign up for the Expert Lead using the VMworld 2019 mobile application or on my.vmworld.com.)

Our goal after VMworld 2019 US is to continue discussions with partners, customers, and internal teams about how a benchmark like VMmark ML would be most useful. We also hope to complete our integration of Spark within Kubernetes on vSphere and reproduce some of the performance analysis done to date. Stay tuned to the performance blog for additional posts and details as they become available.

Extreme Performance Series at VMworld 2019

 

I’m excited to announce that the “Extreme Performance Series” is back for its 7th year with 14 sessions created and being presented by VMware’s best and most distinguished performance engineers, principals, architects, and gurus. You do not want to miss this years program as it’s chock full of advanced content, practical advice, and exciting technical details!

Breakout Sessions

Spread across 4 different VMworld tracks, you’ll find these sessions full of performance details that you won’t get anywhere else at VMworld. They’ll also be recorded, so if the sessions you want to see aren’t being hosted in your region, you’ll still get access to it.

Continue reading

New Scheduler Option for vSphere 6.7 U2

Along with the recent release of VMware vSphere 6.7 U2, we published a new whitepaper that shows the performance of a new scheduler option that was included in the 6.7 U2 update.  We referred to this new scheduler option internally as the “sibling” scheduler, but the official name is the side-channel aware scheduler version 2, or SCAv2.  The whitepaper includes full details about SCAv1 and SCAv2, the L1TF security vulnerability that made them necessary, and the performance implications with several different workload types.  This blog is a brief overview of the key points, but we recommend that you check out the full document.

In August of 2018, a security vulnerability known as L1TF, affecting systems using Intel processors, was revealed, and patches and remediations were also made available. Intel provided micro-code updates for its processors, operating system patches were made available, and VMware provided an update for vSphere. The full details of the vCenter and ESXi patches are in a VMware security advisory that links to individual KB articles.

Continue reading

First VMmark 3.1 Publications, Featuring New Cascade Lake Processors

VMmark is a free tool used by hardware vendors and others to measure the performance, scalability, and power consumption of virtualization platforms.  If you’re unfamiliar with VMmark 3.x, each tile is a grouping of 19 virtual machines (VMs) simultaneously running diverse workloads commonly found in today’s data centers, including a scalable Web simulation, an E-commerce simulation (with backend database VMs), and standby/idle VMs.

As Joshua mentioned in a recent blog post, we released VMmark 3.1 in February, adding support for persistent memory, improving workload scalability, and better reflecting secure customer environments by increasing side-channel vulnerability mitigation requirements.

I’m happy to announce that today we published the first VMmark 3.1 results.  These results were obtained on systems meeting our industry-leading side-channel-aware mitigation requirements, thus continuing the benchmark’s ability to provide an indication of real-world performance.

Continue reading

vMotion across hybrid cloud: performance and best practices

VMware Cloud on AWS is a hybrid cloud service that runs the VMware software-defined data center (SDDC) stack in the Amazon Web Services (AWS) public cloud. The service automatically provisions and deploys a vSphere environment on a bare-metal AWS infrastructure, and lets you run your applications in a hybrid IT environment across your on-premises data centers and AWS global infrastructure. A key benefit of VMware Cloud on AWS is the ability to vMotion workloads back and forth from your on-premises data center to the AWS public cloud as capacity and data privacy require.

In this blog post, we share the results of our vMotion performance tests across our hybrid cloud environment that consisted of a vSphere on-premises data center located in Wenatchee, Washington and an SDDC hosted in an AWS cloud, in various scenarios including hybrid migration of a database server. We also describe the best practices to follow when migrating virtual machines by vMotion across hybrid cloud.

Continue reading

vSAN Performance Diagnostics Now Shows “Specific Issues and Recommendations” for HCIBench

By Amitabha Banerjee and Abhishek Srivastava

The vSAN Performance Diagnostics feature, which helps customers to optimize their benchmarks or their vSAN configurations to achieve the best possible performance, was first introduced in vSphere 6.5 U1. vSAN Performance Diagnostics is a “cloud connected” feature and requires participation in the VMware Customer Experience Improvement Program (CEIP). Performance metrics and data are collected from the vSAN cluster and are sent to the VMware Cloud. The data is analyzed and the results are sent back for display in the vCenter Client. These results are shown as performance issues, where each issue includes a problem with its description and a link to a KB article.

In this blog, we describe how vSAN Performance Diagnostics can be used with HCIBench and show the new feature in vSphere 6.7 U1 that provides HCIBench specific issues and recommendations.

What is HCIBench?

HCIBench (Hyper-converged Infrastructure Benchmark) is a standard benchmark that vSAN customers can use to evaluate the performance of their vSAN systems. HCIBench is an automation wrapper around the popular and proven VDbench open source benchmark tool that makes it easier to automate testing across an HCI cluster. HCIBench, available as a fling, simplifies and accelerates customer performance testing in a consistent and controlled way.

Continue reading

Storage DRS Performance Improvements in vSphere 6.7

Virtual machine (VM) provisioning operations such as create, clone, and relocate involve the placement of storage resources. Storage DRS (sometimes seen as “SDRS”) is the resource management component in vSphere responsible for optimal storage placement and load balancing recommendations in the datastore cluster.

A key contributor to VM provisioning times in Storage DRS-enabled environments is the time it takes (latency) to receive placement recommendations for the VM disks (VMDKs). This latency particularly comes into play when multiple VM provisioning requests are issued concurrently.

Several changes were made in vSphere 6.7 to improve the time to generate placement recommendations for provisioning operations. Specifically, the level of parallelism was improved for the case where there are no storage reservations for VMDKs. This resulted in significant improvements in recommendation times when there are concurrent provisioning requests.

vRealize automation suite users who use blueprints to deploy large numbers of VMs quickly will notice the improvement in provisioning times for the case when no reservations are used.

Several performance optimizations were further made inside key steps of processing the Storage DRS recommendations. This improved the time to generate recommendations, even for standalone provisioning requests with or without reservations.

Continue reading

SPBM compliance check just got faster in vSphere 6.7 U1!

vSphere 6.7 U1 includes several enhancements in Storage Policy-Based Management (SPBM) to significantly reduce CPU use and generate a much faster response time for compliance checking operations.

SPBM is a framework that allows vSphere users to translate their workload’s storage requirements into rules called storage policies. Users can apply storage policies to virtual machines (VMs) and virtual machine disks (VMDKs) using the vSphere Client or through the VMware Storage Policy API’s rich set of managed objects and methods. One such managed object is PbmComplianceManager. One of its methods, PbmCheckCompliance, helps users determine whether or not the storage policy attached to their VM is being honored.

PbmCheckCompliance is automatically invoked soon after provisioning operations such as creating, cloning, and relocating a VM. It is also automatically triggered in the background once every 8 hours to help keep the compliance records up-to-date.

Continue reading

Sharing GPU for Machine Learning/Deep Learning on VMware vSphere with NVIDIA GRID: Why is it needed? And How to share GPU?

By Lan Vu, Uday Kurkure, and Hari Sivaraman 

Data scientists may use GPUs on vSphere that are dedicated to use by one virtual machine only for their modeling work, if they need to. Certain heavier machine learning workloads may well require that dedicated approach. However, there are also many ML workloads and user types that do not use a dedicated GPU continuously to its maximum capacity. This presents an opportunity for shared use of a physical GPU by more than one virtual machine/user. This article explores the performance of a shared-GPU setup like this, supported by the NVIDIA GRID product on vSphere, and presents performance test results that show that sharing is a feasible approach. The other technical reasons for sharing a GPU among multiple VMs are also described here. The article also gives best practices for determining how the sharing of a GPU may be done.

VMware vSphere supports NVIDIA GRID technology for multiple types of workloads. This technology virtualizes GPUs via a mediated passthrough mechanism. Initially, NVIDIA GRID supported GPU virtualization for graphics workloads only. But, since the introduction of Pascal GPU, NVIDIA GRID has supported GPU virtualization for both graphics and CUDA/machine learning workloads. With this support, multiple VMs running GPU-accelerated workloads like machine learning/deep learning (ML/DL) based on TensorFlow, Keras, Caffe, Theano, Torch, and others can share a single GPU by using a vGPU provided by GRID. This brings benefits in multiple use cases that we discuss on this post.  

Continue reading

VMware’s AI-based Performance Tool Can Improve Itself Automatically

PerfPsychic  our AI-based performance analyzing tool, enhances its accuracy rate from 21% to 91% with more data and training when debugging vSAN performance issues. What is better, PerfPsychic can continuously improve itself and the tuning procedure is automated. Let’s examine how we achieve this in the following sections.

How to Improve AI Model Accuracy

Three elements have huge impacts on the training results for deep learning models: amount of high-quality training data, reasonably configured hyperparameters that are used to control the training process, and sufficient but acceptable training time. In the following examples, we use the same training and testing dataset as we presented in our previous blog.

Continue reading