posted

0 Comments

You may have noticed that we just had vSphere 6.5 update 1 released. In this release, VMware has introduced a new feature called vSAN Performance Diagnostics. This feature allows you to customize a certain time range in the past, select a goal from “Max IOPS”, “Max Throughput” and “Min Latency”, and click Show results button, then you can see the potential issues found by analyzing the data. Based on that, you could improve the testing workload to make the performance results better. Customer Experience Improvement Program(CEIP) and vSAN Performance Service must be turned on to get this feature enabled.

From a performance testing perspective, this feature is extremely useful as a guide to tell us what might be wrong with the workload running on vSAN. You can see all the issue details and “Ask VMware” option which is linked to a particular KB article about the issue.

With this exciting new feature, we figured we should do something with HCIBench accordingly to help you to achieve the performance expectation of vSAN. So we integrated the vSAN Performance Diagnostics with HCIBench as an experimental feature and published HCIBench version 1.6.2 on the Fling page.

To take full advantage of this awesome integration in your environment, you need to:

  • Upgrade vCenter to 6.5 update 1
  • Turn on Customer Experience Improvement Program
  • Turn on vSAN Performance Service
  • Upgrade HCIBench to 1.6.2
  • Run HCIBench against vSAN datastore

vSAN Performance Diagnostics Integration

After meeting those pre-requisites above, HCIBench can take care of everything else. Here we completed a test with an inappropriate configuration which has only two VMs with four VMDKs of each deployed on a four-node All-Flash vSAN cluster as an example. The workload we used is 4k, 70%read, 100%random, 30 minutes warmup + 60 minutes test.

When checking the results file, sadly we found we only got 56000 IOPS, obviously we need to tune the test configuration to get more. The sentence at the bottom of the results page points out where we can find the performance diagnostics results. Let’s locate it and take a closer look.

Once getting into the html, there are three options, you can select your goal as getting more IOPS, more THROUGHPUT, or less LATENCY. Those metrics are corresponding with “Max IOPS”, “Max Throughput” and “Min Latency” in vCenter.

We are targeting more IOPS here so we selected “To Get More I/O Per Second”. The “Potential Issue” message was returned from vCenter by calling the vSAN Performance Diagnostics API, “Ask VMware” can redirect us to the KB link of this issue we encountered. “vCenter” hyperlink can directly land us onto the particular vSAN cluster Monitor page within vCenter web client.

Since we had only two VMs deployed, not all the physical drives or disk groups were used and very few outstanding I/O were pushed down to the storage due to the limited number of VMs and VMDKs. In short, we did not leverage all the hardware resources and the workload was way too light.

Click on “vCenter” and input vCenter credential at the login page, we will be landing on vSAN Performance Diagnostics page of the vSAN cluster we tested on. By selecting the Benchmark goal as “Max IOPS” and “HCIBench-vdb-4vmdk-100ws-4k-70rdpct-100randompct-4threads-1501264491” from the time range drop-down list then clicking “Show results”, we can see all the details about those two potential issues displayed in the performance_diag_results html page.

To get more physical drives and disk groups involved in handling the workloads, we re-ran the test by increasing the number of VMs to 16 which also hiked the total number of outstanding I/O across the cluster.

Results showed that the IOPS were way better than the previous test by increasing the number of VMs, latency was about 600 microseconds more but still acceptable. Once moved into the performance diagnostic result page, there’s no potential issue found.

By using this integration, it was pretty easy to figure out what was wrong with my performance testing from the configuration perspective. This integration can help vSAN users achieve their performance goal much easier by leveraging vSAN Performance Diagnostic to determine what could be changed in the workload configuration.

DHCP Service Validation

Another new feature of HCIBench 1.6.2 that I want to mention here is the addition of DHCP service validation in the pre-validation process. Network configuration is the primary pain point when a user is having trouble with HCIBench. Some users find that the pre-validation process succeeds but deployment failed. Most of those cases were due to network misconfiguration which blocked DHCP service.

Essentially, the implementation of developing the DHCP service validation is deploying one tiny VM onto each of the hosts involved and map to the network the user specified. If any of the tiny VMs can’t be deployed or get an IP address, DHCP validation process will fail, and tell user to re-check the network configuration to reduce their frustration from test failures.

Although such deployment including VM’s tiny size (about 10MB) can help to minimize the time consumed by DHCP validation, this feature still adds a few more minutes to the entire validation process, but it helps users to be better prepared before initiating a test.

Alongside those two major features above, of course, there are more enhancements and bug fixes in HCIBench 1.6.2. Download it today and let me know if you like it.