posted

4 Comments

As I promised last time, we are going to talk about how to use the backend scripts to make the performance testing more efficient as well as troubleshooting HCIBench in case you run into an issue. This article might be updated from time to time if there are more issues reported. On Oct. 26th 2016, we published HCIBench version 1.5.0.5; update it today if you haven’t.

First, let me introduce some backend commands that can be used efficiently to make your life easier. As you may already know, HCIBench is essentially a wrapper, and the core executables are located at “/opt/automation”.

pic1

And the configuration file can be found at “/opt/automation/conf/perf-conf.yaml” if the configuration has been saved from the web UI.

pic2

The table below shows the scripts and their usage:

table

The script start-testing.sh contains 4 main steps: deploy-vms.sh -> warm-up-disk.sh (if needed) -> io-test.sh -> cleanup-vm.sh (if needed). And the scripts above provide users the ability to operate on HCIBench as required. Below we have couple of examples of leveraging the scripts.

How to re-use VMs for Vdbench testing

As you may know, client VM deployments and the storage initialization process may take a couple of hours to complete before the Vdbench test starts running. Since the deployment and storage initialization are one-time effort, the VMs can be re-used for more Vdbench tests if needed.

When you’re configuring the testing parameters from the Web UI for the first time, make sure the “Clean Up VMs after testing” is unchecked. After the testing initiated from Web UI has completed, you should be able to follow the steps below to re-use those client VMs for additional tests:

  1. Put all the Vdbench workload profiles into “/opt/automation/vdbench-param-files”.
  2. Edit the file “/opt/automation/conf/perf-conf.yaml” to ensure the value of “self_defined_param_file_path” is “/opt/automation/vdbench-param-files”. Also, you should be able to edit the test name – “output_path” (equals the test name field on the Web UI). Do not edit other entries of this file.
  3. If “Enable DHCP Service on the Network” was checked, you should manually turn on DHCP Service on HCIBench by running “ifconfig eth1 up; service dhcp start”.
  4. Run “/opt/automation/pre-validate-config.sh” to re-verify the settings.
  5. Reboot all your client VMs from vSphere client.
  6. Wait approximately 2 to 3 minutes, in HCIBench, run “ruby /opt/automation/lib/get-vm-ip.rb” to have all the client VM IPs collected in /opt/automation/tmp/vm.yaml.
  7. Run “/opt/automation/io-test.sh” to start Vdbench testing.
  8. You can monitor the process by running “ps -ef |grep ‘Vdbench-io-test.rb’” to see if it’s completed.
  9. All the results could be found at “/opt/output/results/TEST_NAME/”

How to deploy client VMs without I/O test

The script deploy-vms.sh can be executed for client VM deployments, before running this script, make sure the perf-conf.yaml is properly configured. Besides running pre-validate-config.sh before executing the deployment script, you might also want to check if there’re client VMs in the cluster but not in the folder “vdb-CLUSTER_NAME-vms”, they should be deleted as well before making deployment or starting testing. After running “/opt/automation/pre-validate-config.sh”, you can start the deployment by running “/opt/automation/deploy-vms.sh” and monitor the process by running “ps -ef | grep ‘deploy-vms.rb’”.

How to process the results to a summary file manually

The script vdb-process.sh can be used to generate the performance summary file. When you have your results directory ready, for example “/opt/outputresults/4hours-run-on-vsan/vdb-5vmdk-20ws-8k-35rdpct-60randompct-5threads-1478047505”, the directory should contain the Vdbench raw results as well as the directory of vSAN observer files. Find the datastore name(s) which was tested against from the perf-conf.yaml, and run “/opt/automation/vdb-process.sh <RESULTS_DIR> <DATASTORE1_NAME> <DATASTORE2_NAME>”, you will get the summary file “/opt/outputresults/4hours-run-on-vsan/vdb-5vmdk-20ws-8k-35rdpct-60randompct-5threads-1478047505-res.txt” as well as the console output.

pic3

Troubleshooting

Now let’s move on to troubleshooting, since the pre-validation script is not able to cover all the corner cases in different vSphere environments, failures might happen occasionally.

Deployment failure

When a deployment failure occurs, the first thing you may want to check is the deployment log. With the new version 1.5.0.5, you can review the HCIBench logs by visiting http://HCIBench_IP/hcibench_logs/, if you left “Deploy on Hosts” unchecked, you can check the log file named vc-VC_IP-vm-deploy.log or you can check the host-HOST_IP-vm-deploy.log if you had that box checked. You should be able to see if the client VMs failed to get IP addresses or whether they were reachable.

In most of those failed deployment cases, network issues are the root cause. For the VLAN which the client VMs will be deployed on, SSH and DHCP services should be allowed to go through between the HCIBench VM and the VLAN. Let’s review a few failure cases:

VMs failed to get IP addresses

If you are using HCIBench to provide DHCP service and VMs fail to get IP addresses, meaning you have “Enable DHCP Service on the Network” checked, there are several things you need to check, follow these steps:

  1. The status of DHCP service – run “ifconfig eth1 up; service dhcp start” and check “systemctl status dhcp”. Leave the DHCP service turned on.
  2. The connectivity of the VLAN (VLAN012 for example) that the client VMs deployed on between HCIBench controller VM and all the ESXi hosts. To do this, you can:
    2a. Create a vmkernel on each of the hosts for VLAN012 and see if they can get IP addresses
    2b. If the vmkernels can’t get IP addresses, set the static IPs to those vmkernels using 192.168.2.x/18 and try to ping the addresses from HCIBench.
    If step 2b fails this means you have a network connectivity issue on VLAN012; if 2b succeeds, but 2a fails, this means VLAN012 isn’t blocking DHCP services.
  3. Resolve the issue.
  4. Modify the number of VMs in perf-conf.yaml to the same number of ESXi hosts in the cluster with 1 data disk per VM to shorten the time of deployment for testing purpose and run “/opt/automation/deploy-vms.sh” if “/opt/automation/pre-validate-config.sh” could be successfully executed. Check if all the VMs could get IP addresses.
  5. If all the VMs can get IPs, change back the number of VMs and the number of data disks and re-run the testing.

If you have your own DHCP service in the VLAN, execute step 4 from above.  After the VMs are deployed, check the client VM on each ESXi host to see if they have IP addresses. If not all of the VMs can get IP addresses, you should consider if the IP range is big enough.

HCIBench can’t resolve the vCenter hostname or the ESXi hostname

If you are having an issue where HCIBench is not able to resolve the vCenter or ESXi hostnames, you can configure the DNS in file /etc/systemd/network/eth0.network by following this guide. Or, you can manually add the hostnames and IP addresses to /etc/resolv.conf.

If the VLAN has DHCP service and the client VMs can get IP addresses, but are not accessible from HCIBench

This means VLAN has DHCP service, but HCIBench doesn’t have a route to the VLAN. To setup the route to the VLAN, it would be best to have a layer 2 connection instead of a layer 3 connection. This will minimize potential disconnection or timeout issues enforced by layer 3 policies. To do this:

  1. Map the HCIBench private network adapter to the same VLAN.
  2. Edit the private network adapter configuration file /etc/systemd/network/eth1.network as follows and reboot the server.
    pic4
  3. After rebooting, run “ifconfig eth1 up; ifconfig” to check if eth1 has a valid IP.
  4. Have “Enable DHCP Service on the Network” checked and save the configuration. Then kick off the testing again.

The above 3 cases were frequently reported by HCIBench users. If you are having any kind of issues or difficulties while using HCIBench, you are more welcome to report it to vsanperformance@vmware.com. Part 3 will contain more troubleshooting details, so stay tuned!!!