Home > Blogs > VMware VROOM! Blog > Tag Archives: vmware

Tag Archives: vmware

Storage DRS Performance Improvements in vSphere 6.7

Virtual machine (VM) provisioning operations such as create, clone, and relocate involve the placement of storage resources. Storage DRS (sometimes seen as “SDRS”) is the resource management component in vSphere responsible for optimal storage placement and load balancing recommendations in the datastore cluster.

A key contributor to VM provisioning times in Storage DRS-enabled environments is the time it takes (latency) to receive placement recommendations for the VM disks (VMDKs). This latency particularly comes into play when multiple VM provisioning requests are issued concurrently.

Several changes were made in vSphere 6.7 to improve the time to generate placement recommendations for provisioning operations. Specifically, the level of parallelism was improved for the case where there are no storage reservations for VMDKs. This resulted in significant improvements in recommendation times when there are concurrent provisioning requests.

vRealize automation suite users who use blueprints to deploy large numbers of VMs quickly will notice the improvement in provisioning times for the case when no reservations are used.

Several performance optimizations were further made inside key steps of processing the Storage DRS recommendations. This improved the time to generate recommendations, even for standalone provisioning requests with or without reservations.

Test Setup and Results

We ran several performance tests to measure the improvement in recommendation times between vSphere 6.5 and vSphere 6.7. We ran these tests in our internal lab setup consisting of hundreds of VMs and few thousands of VMDKs. The VM operations are

  1. CreateVM – A single VM per thread is created.
  2. CloneVM – A single clone per thread is created.
  3. ReconfigureVM – A single VM per thread is reconfigured to add an additional VMDK.
  4. RelocateVM – A single VM per thread is relocated to a different datastore.
  5. DatastoreEnterMaintenance – Put a single datastore into maintenance mode. This is a non-concurrent operation.

Shown below are the relative improvements in recommendation times for VM operations at varying concurrencies. The y-axis has a numerical limit of 10, to allow better visualization of the relative values of the average recommendation time. 

The concurrent VM operations show an improvement of between 20x and 30x in vSphere 6.7 compared to vSphere 6.5 

Below we see the relative average time taken among all runs for serial operations.

The Datastore Enter Maintenance operation shows an improvement of nearly 14x in vSphere 6.7 compared to vSphere 6.5

With much faster storage DRS recommendation times, we expect customers to be able to provision multiple VMs much faster to service their in-house demands. Specifically, we expect VMware vRealize Automation suite users to hugely benefit from these improvements.

SPBM compliance check just got faster in vSphere 6.7 U1!

vSphere 6.7 U1 includes several enhancements in Storage Policy-Based Management (SPBM) to significantly reduce CPU use and generate a much faster response time for compliance checking operations.

SPBM is a framework that allows vSphere users to translate their workload’s storage requirements into rules called storage policies. Users can apply storage policies to virtual machines (VMs) and virtual machine disks (VMDKs) using the vSphere Client or through the VMware Storage Policy API’s rich set of managed objects and methods. One such managed object is PbmComplianceManager. One of its methods, PbmCheckCompliance, helps users determine whether or not the storage policy attached to their VM is being honored.

PbmCheckCompliance is automatically invoked soon after provisioning operations such as creating, cloning, and relocating a VM. It is also automatically triggered in the background once every 8 hours to help keep the compliance records up-to-date.

In addition, users can invoke the method when checking compliance for a VM storage policy in the vSphere Client, or through the VMware Storage Policy API method PbmCheckCompliance.

We did a study in our lab to compare the performance of PbmCheckCompliance between vSphere 6.5 U2 and vSphere 6.7 U1. We present this comparison in the form of charts showing the latency (normalized on a 100-point scale) of PbmCheckCompliance for varying numbers of VMs.

The following chart compares the performance of PbmCheckCompliance on VMFS and vSAN environments.

As we see from the above chart, PbmCheckCompliance returns results much faster in vSphere 6.7 U1 compared to 6.5 U2. The improvement is seen across all inventory sizes and all datastore types and become more prominent for larger inventories and higher numbers of VMs.

The enhancements also positively impact a similar method, PbmCheckRollupCompliance. This method also returns the compliance status of VMs and adds compliance results for all disks associated with these VMs. The following chart represents the performance comparison of PbmCheckRollupCompliance on VMFS and vSAN environments.

Our experiments show that compliance check operations are significantly faster and more light-weight in vSphere 6.7 U1.

DRS Enhancements in vSphere 6.7

A new paper describes the DRS enhancements in vSphere 6.7, which include new initial placement, host maintenance mode enhancements, DRS support for non-volatile memory (NVM), and enhanced resource pool reservations.

Resource pool and VM entitlements—old and new models

A summary of the improvements follows:

  • DRS in vSphere 6.7 can now take advantage of the much faster placement and more accurate recommendations for all DRS configurations. vSphere 6.5 did not include support for some configurations like VMs that had fault tolerance (FT) enabled, among others.
  • Starting with vSphere 6.7, DRS uses the new initial placement algorithm to come up with the recommended list of hosts to be placed in maintenance mode. Further, when evacuating the hosts, DRS uses the new initial placement algorithm to find new destination hosts for outgoing VMs.
  • DRS in vSphere 6.7 can handle VMs running on next generation persistent memory devices, also known as Non-Volatile Memory (NVM) devices.
  • There is a new two-pass algorithm that allocates a resource pool’s resource reservation
    to its children (also known as divvying).

For more information about all of these updates, see DRS Enhancements in vSphere 6.7.

VMware’s AI-based Performance Tool Can Improve Itself Automatically

PerfPsychic  our AI-based performance analyzing tool, enhances its accuracy rate from 21% to 91% with more data and training when debugging vSAN performance issues. What is better, PerfPsychic can continuously improve itself and the tuning procedure is automated. Let’s examine how we achieve this in the following sections.

How to Improve AI Model Accuracy

Three elements have huge impacts on the training results for deep learning models: amount of high-quality training data, reasonably configured hyperparameters that are used to control the training process, and sufficient but acceptable training time. In the following examples, we use the same training and testing dataset as we presented in our previous blog.

Amount of Training Data

The key of PerfPsychic is to prove the effectiveness of our deep learning pipeline, so we start by gradually adding more labeled data to the training dataset. This is to demonstrate how our models learn from more labeled data and improve their accuracy over time. Figure 1 shows the results where we start from only 20% of the training dataset and gradually label 20% more each time. It shows a clear trend that as more properly labeled data is added, our model learns and improves its accuracy, without any further human intervention. The accuracy is improved from around 50% when we have only about 1,000 data points to 91% when we have the full set of 5,275 data points. Such accuracy is as good as a programmatic analytic rule that took us three months to tune manually.

Figure 1. Accuracy improvement over larger training datasets

Training Hyperparameters

We next vary several other CNN hyperparameters to demonstrate how they were selected for our models. We change only one hyperparameter at a time and train 1,000 CNNs using the configuration. We first vary a different number of iterations in training, namely for how many times we go through the training dataset. If the number of iterations is too few, the CNNs cannot be trained adequately and, if the iteration number is too large, training will take a much longer time and it also might end up overfitting to the training data. As shown in Figure 2, between 50 to 75 iterations is the best range, where 75 iterations achieve the best accuracy of 91%.

Figure 2. Number of training iterations vs. accuracy

We next vary the step size, which is our granularity to search for the best model. In practice, with a small step size, the optimization is so slow that it cannot reach the optimal point in a limited time. With a large step size, we risk passing optimal points easily. Figure 3 shows that, between 5e-3 to 7.5e-3, the model produces good accuracy, where 5e-3 predicts 91% of the labels correctly.

Figure. 3 Step size vs. accuracy

We last evaluate the impact of issue rate of the training data in terms of accuracy. Issue rate is the percentage of training data that represents performance issues among the total. In an ideal set of training data, all the labels should be equally represented to avoid overfitting. A biased dataset generally results in overfitting models that can barely  achieve high accuracy. Figure 4 below shows that when the training data have under 20% of issue rate (that is, under 20% of the components are faulty), the model basically overfits to “noissue” data points and predicts all components are issue-free. Because our testing data have 21.9% of components without issues, it stays at 21.9% accuracy. In contrast, when we have over 80% of an issue rate, the model simply treats all components as faulty and thus achieves the 78.1% accuracy. This explains why it is important to ensure every label is equally represented, and why we mix our issue/noissue data in a ratio between 40% to 60%.

Figure 4. Impact of issue rate

Training Duration

Training time is also an important factor in a practical deep learning pipeline design. As we train thousands of CNN models, spending one second longer to train a model means a whole training phase will take 1,000 seconds longer. Figure 9 below shows the training time vs. data size and the number of iterations. As we can see, both factors form a linear trend; that is, with more data and more iterations, training will take linearly longer. Fortunately, we know from the study above that any more than 75 iterations will not help accuracy. By limiting the number of iterations, we can complete a whole phase of training in less than 9 hours. Again, once the off-line training is done, the model can perform real-time prediction in just a few milliseconds. The training time simply affects how often and how fast the models can pick up new feedback from product experts.

Figure 5. Effect of data size and iteration on training time

Automation

The model selection procedure is fully automated. Thousands of models with different hyperparameter settings are training in parallel on our GPU-enabled servers. The trained results compete with each other by analyzing our prepared testing data and reporting the final results. We then pick the model with the highest correct rate, put it into PerfPsychic and use it for online analysis. Moreover, we also keep a record of the parameters in the the winning models and use them as initial setups in future trainings. Therefore, our models can keep evolving.

PerfPsychic in Application

PerfPsychic is not only a research product, but also an internal performance analysis tool which is widely used. Now it is used to automatically analyze vSAN performance bugs on Bugzilla.

PerfPsychic automatically detects new vSAN performance bugs submitted in Bugzilla and extracts its usable data logs in the bug attachment. Then it analyzes the logs with the trained models. Finally, the analysis results are emailed to bug submitters and vSAN developer group where performance enhancement suggestions are included.

Below is part of an email received yesterday that gives performance tuning advice on a vSAN bug. Internal information are hidden.

Figure 6. Part of email generated by PerfPsychic to offer performance improvement suggestions

 

VMware Speedily Resolves Customer Issues in vSAN Performance Using AI

We in VMware’s Performance team create and maintain various tools to help troubleshoot customer issues—of these, there is a new one that allows us to more quickly determine storage problems from vast log data using artificial intelligence. What used to take us days, now takes seconds. PerfPsychic analyzes storage system performance and finds performance bottlenecks using deep learning algorithms.

Let’s examine the benefit artificial intelligence (AI) models in PerfPsychic bring when we troubleshoot vSAN performance issues. It takes our trained AI module less than 1 second to analyze a vSAN log and to pinpoint performance bottlenecks at an accuracy rate of more than 91%. In contrast, when analyzed manually, an SR ticket on vSAN takes a seasoned performance engineer about one week to deescalate, while the durations range from 3 days to 14 days. Moreover, AI also wins over traditional analyzing algorithms by enhancing the accuracy rate from around 80% to more than 90%.

Architecture

There are two operation modes in the AI module: off-line training mode and real-time prediction mode. In the training mode, sets of training data, which are labeled with their performance issues, are automatically fed to all potential convolutional neural network (CNN) [1] structures, which we train repeatedly on GPU-enabled servers. We train thousands of models at a time and pick the one that achieves the best accuracy to a real-time system. In the real-time prediction mode, unlabeled user data are sent to the model chosen from the training stage, and a prediction of the root cause (faulty component) is provided by it.

As shown in Figure 1, data in both training and prediction modes are first sent to a data preparation module (Queried Data Preparation), where data are formatted for later stages. The data path then diverges. Let’s first follow the dashed line for the data path of labeled training data. They are sent to the deep learning training module (DL Model Training) to train an ensemble of thousands of CNNs generated from our carefully designed structures. After going through all the training data for more than thousands of times and having the training accuracy rate converged to a stable value, the trained CNNs will compete with each other in the deep learning model selection module (DL Model Selection), where they are requested to predict the root causes of testing data that the models have never seen before. Their predictions are compared to the real root causes, which are labeled by human engineers, to calculate the testing accuracy rate. Finally, we provide an ensemble of models (Trained DL Model) that achieve the best testing accuracy to the real-time prediction system.

Figure 1: Deep Learning Module Workflow

You might expect this training process to be both time consuming and resource hungry and so, it should be carried out off-line on servers equipped with powerful GPUs. On the contrary, prediction mode is relatively light-weight and can adapt to real-time applications.

Following the solid line in Figure 1 for prediction mode, the unlabeled normalized user data are sent to our carefully picked models, and the root cause (Performance Exception) is predicted based on a small amount of calculations. The prediction will be returned to the upper layer such as our interactive analytic web UI, automatic analysis, or proactive analysis applications. Like the interactive analytic part, our web UI also has a means of manually validating the prediction, which will automatically trigger the next round of model training. This completes the feedback loop and ensures our models continue to learn from human feedback.

AI Wins Over Manual Debugging

Diagnosing performance problems in a software-defined datacenter (SDDC) is difficult due to both the scale of the systems and the scale of the data. The scale of the software and hardware systems results in complicated behaviors that are not only workload-dependent but also interfering with each other. Thus, pin-pointing a root cause requires thorough examinations of the entire datacenter. However, due to the scale of data collected across a datacenter, this analysis process requires many human efforts, takes an extremely long time, and is prone to errors. Take vSAN for example—dealing with performance-related escalations typically requires cross-departmental efforts examining vSAN stacks, ESXi stacks, and physical/virtual network stacks. In some cases, it took months for many engineers to pinpoint problems outside of the VMware stack, such as physical network misconfigurations. On average, it takes one week to deescalate a client’s service request ticket with the effort of many experienced engineers working together.

PerfPsychic is designed to address challenges we have faced and to further make performance diagnostics more scalable. PerfPsychic builds upon a data infrastructure that is at least 10 times faster and 100 times more scalable than the existing one. It provides an end-to-end interactive analytic UI allowing users to perform the majority of the analysis in one place. The analysis results will then immediately be fed back to the deep learning pipeline in the backend, which produces diagnostic models that detect a faulty component more accurately as more feedback gets collected. These models mostly take only a few hours to train, and can detect faulty components in a given dataset in a few milliseconds, with comparable accuracy to rules that took us months to tune manually.

AI Wins Over Traditional Algorithms

To prove the effectiveness of our AI approach, we tested it against traditional machine learning algorithms.

First, we created two datasets: training data and testing data, as summarized in Table 1.

Table 1: Training and Testing Data Property

Training data are generated from our simulated environment: a simple 4-node hybrid vSAN setup. We manually insert performance errors into our testing environment to collect the training data with accurate labels. In the example of a network issue, we simulate packet drops by having vmkernel drop a receiving packet at VMK TCP/IP for every N packets. This mimics the behavior of packet drops in the physical network. We vary N to produce enough data points for training. Although this does not 100% reproduce what happens in a customer environment, it is still a best practice since it is the only cost-effective way to get a large volume of labeled data which are clean and accurate.

The testing data, in contrast to the training data, are all from customer escalations, which have very different system configurations in many aspects (number of hosts, types and number of disks, workloads, and so on). In our testing data, we have 78.1% of the data labeled with performance issues. Note that the “performance issue” refers to a specific component in the system that is causing the performance problem in the dataset. We define “accuracy” as the percentage of predictions that the CNN model gives the correct label to all components from the testing datasets (“issue” or “no issue”).

With the same training data, we trained one CNN, and four popular machine learning models: Support Vector Machine (SVM) [2], Logistic Classification (LOG) [3], Multi-layer Perceptron Neural Network (MLP) [4] and Multinomial Naïve Bayes (MNB) [5]. Then we tested the five models against the testing dataset. To quantify model performances, we calculate their accuracy as follows.

Finally, we compared the accuracy rate achieved by each model, which are shown in Figure 2. The result reveals that AI is a clear winner, with 91% accuracy.

Figure 2: Analytic Algorithm Accuracy Comparison

Acknowledgments

We appreciate the assistance and feedback from Chien-Chia Chen, Amitabha Banerjee and Xiaobo Huang. We also feel grateful to the support from our manager Rajesh Somasundaran. Lastly, we thank Julie Brodeur for her help in reviewing and recommendations for this blog post.

References

  1. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions. CoRR,” abs/1409.4842, 2014.
  2. J. Smola, B. Schölkopf, “A Tutorial on Support Vector Regression,” Statistics and Computing Archive Volume 14 Issue 3, August 2004, p. 199-222.
  3. C. Bishop, “Pattern Recognition and Machine Learning,” Chapter 4.3.4.
  4. E. Rumelhart, G. E. Hinton, R. J. Williams, “Learning representations by back-propagating errors,” http://www.iro.umontreal.ca/~pift6266/A06/refs/backprop_old.pdf.
  5. Zhang, “The optimality of Naive Bayes,” Proc. FLAIRS, 2004, http://www.cs.unb.ca/~hzhang/publications/FLAIRS04ZhangH.pdf.

vSphere with iSER – How to release the full potential of your iSCSI storage!

By Mark Ma

With the release of vSphere 6.7, VMware added iSER (iSCSI Extensions for RDMA) as a native supported storage protocol to ESXi. With iSER run over iSCSI, users can boost their vSphere performance just by replacing the regular NICs with RDMA-capable NICs. RDMA (Remote Direct Memory Access) allows the transfer of memory from one computer to another. This is a direct transfer and minimizes CPU/kernel involvement. By bypassing the kernel, we get extremely high I/O bandwidth and low latency. (To use RDMA, you must have an HCA/Host Channel Adapter device on both the source and destination.) In this blog, we compare standard iSCSI performance vs. iSER performance to see how iSER can release the full potential of your iSCSI storage.

Testbed Configuration

The iSCSI/iSER target system is based on open source Ubuntu 18.04 LTS server with 2 x E5-2403 v2 CPU, 96 GB RAM, 120 GB SSD for OS, 8 x 450 GB (15K RPM) SAS drive and Mellanox ConnectX-3 Pro EN 40 GbE NIC (RDMA capable). The file system is ZFS with 4 x mirror set with 8 x 450 GB SAS drive (RAID 10). ZFS has an advanced memory feature that can produce very good random IOPS and read throughput. We did not place any SSD for caching since the test is rather to compare protocol difference, not disk drives. The iSCSI/iSER target is Linux SCSI target framework (TGT).

The iSCSI/iSER initiator is ESXi 6.7.0, 9214924 with 2 x Intel Xeon E5-2403 v2 CPU @ 1.8 GHz, 96 GB RAM, USB boot drive, and  Mellanox ConnectX-3 Pro EN 40 GbE NIC (RDMA capable) has been used to benchmark the performance boost that iSER enables vs. iSCSI.

Both target and initiator connect to 40 GbE switch with QSFP cables for optimal network performance.

Both NICs have the latest firmware with 2.42.5000.

To measure performance, we used VMware I/O Analyzer, which uses the industry-standard benchmark Iometer.

iSCSI Test

We set the target with the iSCSI driver to ensure the first test measures the standard iSCSI protocol.

Figure 1

For the iSCSI initiator, we simply enable the iSCSI software adapter.

Figure 2

iSCSI test one: Max Read IOPS—this test shows the max read IOPS (4K random read I/Os per second) from the iSCSI storage.

Result: 34,255.18 IOPS

Figure 3

iSCSI test two: Max Write IOPS, this test shows the max write IOPS from the iSCSI storage.

Result: 36,428.26 IOPS

Figure 4

iSCSI test three: Max Read Throughput—this test shows the max read throughput from the iSCSI storage.

Result: 2,740.80 MBPS

Figure 5

iSCSI test four: Max Write Throughput—this test shows the potential max write throughput from the iSCSI storage. (The performance is rather low due to ZFS RAID configuration and limited disk spindle.)

Result: 112.04 MBPS

Figure 6

iSER Test

We set the target to the iSER driver to ensure the second test measures only iSER connections.

Figure 7

For the iSER initiator, we need to verify that an RDMA-capable NIC is installed. For this, we use the command:
esxcli rdma device list

Figure 8

Then we run the following command from the ESXi host to enable the iSER adapter.

esxcli rdma iser add

 

Figure 9

 

iSER test one: Max Read IOPS—this test shows the max read IOPS from the iSER storage.

Result: 71,108.85 IOPS, which is 207.59% over iSCSI.

Figure 10

iSER test two: Max Write IOPS—this test shows the max write IOPS from the iSER storage.

Result: 69,495.70 IOPS which is 190.77% over iSCSI.

Figure 11

iSER test three: Max Read Throughput—this test shows the max read throughput, measured in megabytes per second (MBPS), from the iSER storage.

Result: 4,126.53 MBPS which is 150.56% over iSCSI.

Figure 12

iSCSI test four: Max Write Throughput—this test shows the max write throughput from the iSER storage. (The performance is rather low due to ZFS raid configuration and limited disk spindle.)

Result: 106.48 MBPS which is about 5% less than iSCSI.

Figure 13

Results

Figure 14

Figure 15

Figure 16

The performance over random I/O is about a 200% increase for both read and write. That’s about a 150% increase for read throughput. Write throughput is about the same. The only difference is the storage protocol. We also performed these tests in an environment made up of older hardware, so just imagine what vSphere with iSER could do when using state-of-the-art, NVME-based storage and the latest 200 GbE network equipment.

Conclusion

The result seemed too good to be true, so I ran the benchmark several times to ensure its consistency. It’s great to see VMware’s Innovation initiative in action. Who could think that the “not so exciting” traditional iSCSI storage could be improved over 200% through the efforts of VMware and Mellanox. It’s great to see VMware continues to push the boundary of the Software-Defined Datacenter to better serve our customers in their digital transformation journey!

About the Author

Mark Ma is a senior consultant at VMware Professional Services. He is heavily involved with POC, architecture design, assessment, implementation, and user training. Mark specializes in end-to-end virtualization solutions based on Citrix, Microsoft, and VMware applications.

New white paper: Big Data performance on VMware Cloud on AWS: Spark machine learning and IoT analytics performance on-premises and in the cloud

By Dave Jaffe

A new white paper is available comparing Spark machine learning performance on an 8-server on-premises cluster vs. a similarly configured VMware Cloud on AWS cluster.

Here is what the VMware Cloud on AWS cluster looked like:

Screenshot of cluster configuration

VMware Cloud on AWS configuration for performance tests

Three standard analytic programs from the Spark machine learning library (MLlib), K-means clustering, Logistic Regression classification, and Random Forest decision trees, were driven using spark-perf. In addition, a new, VMware-developed benchmark, IoT Analytics Benchmark, which models real-time machine learning on Internet-of-Things data streams, was used in the comparison. The benchmark is available from GitHub.

As seen in the charts below, performance was very similar on-premises and on VMware Cloud on AWS.

Spark machine learning performance chart

Spark machine learning performance

IoT Analytics performance chart

IoT Analytics performance

 

All details are in the paper.

Persistent Memory Performance in vSphere 6.7

We published a paper that shows how VMware is helping advance PMEM technology by driving the virtualization enhancements in vSphere 6.7. The paper gives a detailed performance analysis of using PMEM technology on vSphere using various workloads and scenarios.

These are the key points that we cover in this white paper:

  • We explain how PMEM can be configured and used in a vSphere environment.
  • We show how applications with different characteristics can take advantage of PMEM in vSphere. Below are some of the use-cases:
    • How PMEM device limits can be achieved under vSphere with little to no overhead of virtualization. We show virtual-to-native ratio along with raw bandwidth and latency numbers from fio, an I/O microbenchmark.
    • How traditional relational databases like Oracle can benefit from using PMEM in vSphere.
    • How scaling-out VMs in vSphere can benefit from PMEM. We used Sysbench with MySQL to show such benefits.
    • How modifying applications (PMEM-aware) can get the best performance out of PMEM. We show performance data from such applications, e.g., an OLTP database like SQL Server and an in-memory database like Redis.
    • Using vMotion to migrate VMs with PMEM which is a host-local device just like NVMe SSDs. We also characterize in detail, vMotion performance of VMs with PMEM.
  • We outline some best practices on how to get the most out of PMEM in vSphere.

Read the full paper here.

Performance Best Practices Guide for vSphere 6.7

We are pleased to announce the availability of Performance Best Practices for VMware vSphere 6.7. This is a comprehensive book designed to help system administrators obtain the best performance from their vSphere 6.7 deployments.

The book covers new features as well as updating and expanding on many of the topics covered in previous versions.

These include:

  • Hardware-assisted virtualization
  • Storage hardware considerations
  • Network hardware considerations
  • Memory page sharing
  • Getting the best performance with iSCSI and NFS storage
  • Getting the best performance from NVMe drives
  • vSphere virtual machine encryption recommendations
  • Running storage latency-sensitive workloads
  • Network I/O Control (NetIOC)
  • DirectPath I/O
  • Running network latency-sensitive workloads
  • Microsoft Virtualization-Based Security (VBS)
  • CPU Hot Add
  • 4KB native drives
  • Selecting virtual network adapters
  • The vSphere HTML5 Client
  • vSphere web client configuration
  • Pair-wise balancing in DRS-enabled clusters
  • VMware vSphere update manager
  • VMware vSAN performance

The book can be found here.

Also, for a summary of the new performance-related features in vSphere 6.7, refer to What’s New in Performance.

Oracle Database Performance with VMware Cloud on AWS

You’ve probably already heard about VMware Cloud on Amazon Web Services (VMC on AWS). It’s the same vSphere platform that has been running business critical applications for years, but now it’s available on Amazon’s cloud infrastructure. Following up on the many tests that we have done with Oracle databases on vSphere, I was able to get some time on a VMC on AWS setup to see how Oracle databases perform in this new environment.

It is important to note that VMC on AWS is vSphere running on bare metal servers in Amazon’s infrastructure. The expectation is that performance will be very similar to “regular” onsite vSphere, with the added advantage that the hardware provisioning, software installation, and configuration is already done and the environment is ready to go when you login. The vCenter interface is the same, except that it references the Amazon instance type for the server.

Our VMC on AWS instance is made up of four ESXi hosts. Each host has two 18-core Intel Xeon E5-2686 v4 (aka Broadwell) processors and 512 GB of RAM. In total, the cluster has 144 cores and 2 TB of RAM, which gives us lots of physical resources to utilize in the cloud.

In our test, the database VMs were running Red Hat Enterprise Linux 7.2 with Oracle 12c. To drive a load against the database VMs, a single 18 vCPU driver VM was running Windows Server 2012 R2, and the DVD Store 3 test workload was also setup on the cluster. A 100 GB test DS3 database was created on each of the Oracle database VMs. During testing, the number of threads driving load against the databases were increased until maximum throughput was achieved, which was around 95% CPU utilization. The total throughput across all database servers for each test is shown below.

 

In this test, the DB VMs were configured with 16 vCPUs and 128 GB of RAM. In the 8 VMs test case, a total of 128 vCPUs were allocated across the 144 cores of the cluster. Additionally the cluster was also running the 18 vCPU driver VM,  vCenter, vSAN, and NSX. This makes the 12 VM test case interesting, where there were 192 vCPUs for the DB VMs, plus 18 vCPUs for the driver. The hyperthreads clearly help out, allowing for performance to continue to scale, even though there are more vCPUs allocated than physical cores.

The performance itself represents scaling very similar to what we have seen with Oracle and other database workloads with vSphere in recent releases. The cluster was able to achieve over 370 thousand orders per minute with good scaling from 1 VM to 12 VMs. We also recently published similar tests with SQL Server on the same VMC on AWS cluster, but with a different workload and more, smaller VMs.

UPDATE (07/30/2018): The whitepaper detailing these results is now available here.