Customers Hyperconverged Infrastructure vSAN

VMware Virtual SAN for Remote Office and Branch Office (ROBO)

As a global company, VMware has many office locations throughout the world. I myself, being based in Beijing, get to hear from many customers who would love to deploy VMware Virtual SAN at a Remote Office and Branch Office (ROBO) site. However, they have concerns and questions with regards to performance and reliability. Well, question no more. This blog is going to knock your socks off.

For a cost-effective ROBO deployment, VMware customers want all of the benefits of manageability, performance, and availability in a Virtual SAN Cluster without the minimum requirement of three nodes. Virtual SAN 6.1 introduced Virtual SAN for ROBO, solution specifically designed to satisfy the demands of smaller ROBO environments

The Virtual SAN ROBO solution delivers the same enterprise-class performance and availability as the enterprise edition. To test the architecture, we designed a ROBO deployment based on feedback from customers, using a 2-node Hybrid VMware Virtual SAN Cluster running a combination of VDI, Windows File Share, and a small SQL Server database server.

As shown in the figure, the ROBO architecture is composed of two parts: the central Data Center containing the management cluster and a ROBO office.  The central Data Center is where central IT management occurs and the remote 2-node Virtual SAN Cluster carries the real workloads.

ROBO_Arch

 

Server Configuration

Each ESXi server in ROBO 2-node Virtual SAN Cluster has the following configuration.

ROBO_ServerSpecs

Infrastructure VMs Configuration

Single vCenter Server with following roles:

  • vCenter
  • vCenter single sign-on (SSO)
  • vCenter Inventory Service

Windows Active Directory Server

Infrastructure VM Configuration – Hosted on Management Cluster

ROBO_Infra_VM

 

Network Emulation

As shown in the below figure, we use two Ubuntu14.04 virtual machines acting as the software routers. Each Ubuntu VM has 3 vNICs. Bandwidth limit and latency are added to the vNIC connecting to VLAN5 by using the Linux ‘tc’ command (see this blog for details of how to set up bandwidth limit and latency) In the test setup, the bandwidth limit is set to 1.5Mbps and latency is set to RTT 500ms to simulate WAN connection between the central Data Center and a ROBO office. Both management and Virtual SAN traffics share the same VLAN5.

 

Minimum Network Connectivity Requirements for Witness Virtual Appliance

  • 1.5 Mbps connectivity
  • 500 milliseconds latency RTT
  • Layer 3 network connectivity without multicast to the nodes in the cluster

 

ROBO_network

Note: the network configuration appears a bit complicated for a ROBO setup, this is because of the WAN emulation we have to do in the lab environment. Real-world ROBO deployment doesn’t necessarily require such complex network configuration.

 

Workloads and Benchmark Tool

During testing, there are three types of workloads running concurrently in the ROBO site:

  • TPC-E-like workload with a small SQL Server database
  • VDI workload with a number of VMware Horizon View virtual desktops
  • File server workload with a Windows 2012 file share server

These are workloads that represent typical IT requirements in ROBO environment. The following virtual machines are configured and tested:

ROBO_VDI_VMs

SQL Server

Use Benchmark Factory scale factor of 5 to test SQL server and get TPC-E-like performance in terms of TPS (transactions per second) and latency. With the increase of the scale factor defined by Benchmark Factory, the number of rows added to the tables increases, which allows for larger user loads.

 

VMware Horizon virtual desktops

Use Login VSI Benchmark mode with 20 sessions to measure VDI performance in terms of VSIMax score. The workload type ‘knowledge worker* 2vCPU’.

Windows file share server

Use DBENCH (https://dbench.samba.org/) to measure file server performance in terms of bandwidth and latency.

We created 20 files of 100MB each in the share folder. Two virtual clients act as users to access the share folder concurrently. Each virtual user picks a random file, reads and writes to it. Then the user picks another file and goes into next iteration.

 

Key Results

The following highlights validate that Virtual SAN is a high performance storage solution suitable for ROBO even in a modest 2-node configuration. The data reflects all 3 workloads running concurrently. Each test lasts 1 hour.

To summarize, the tests prove that 2-node ROBO Virtual SAN Cluster can support consolidated applications and mixed workloads, including deploying 20 virtual desktops in 43 minutes and got a Login VSI score of 656, 650+ database TPS, and 50+MB/sec file server service.

In the tests, we conducted two separate tests. One is run without any injecting error and another is run with one SSD failure during the tests. The results show that 2-node ROBO Virtual SAN cluster can handle hardware failure very well with no data loss and limited performance impact.

LoginVSIScore

The first chart shows that in the base test without injecting a disk error, we were able to achieve a Login VSI score of 656. However, if we inject an SSD error during the test, the Login VSI score just increases to 658. The VSImax value is not reached under both situations. Remember that a lower Login VSI score is better. This comparison shows that there is just a slight difference with one SSD failure.

ROBO_BMF_Result

This chart shows the Benchmark Factory result. Baseline testing achieved TPS of 700. With an average response time is 3ms, and an average transaction time is 28ms. If we inject an SSD error during the test, the TPS drops to 683. The average response time remains the same as 3ms and the average transaction time increases a little bit to 29ms.

ROBO_File_Server_result

This chart shows the DBENCH result for file transferring. In the base test without injecting failure, we got an average bandwidth of 52.31MB/s. The average read latency is 0.595ms and the average write latency is 0.275ms. If we inject an SSD error during the test, the average bandwidth drops to 48.82MB/s. The average read latency increases to 0.787ms and the average write latency increase to 0.286ms. Again, we just observe a slight performance change if an SSD failure occurs.

From the above results, we can see that 2-node ROBO Virtual SAN is proved to support consolidated applications and mixed workloads. Furthermore, a hardware failure will not cause any data loss and just have a slight impact on performance.

In summary, VMware Virtual SAN for ROBO is not only simple to configure and manage, it also provides the same enterprise performance and can handle mixed workloads. Furthermore, Virtual SAN for ROBO delivers the resiliency required for any ROBO solution. This blog is a preview of a comprehensive reference architecture paper that is being published very soon, stay tuned.