This blog was co-authored by Kathryn Erickson, Director of Strategic Partnerships at DataStax Enterprise
DataStax and VMware have been diligently working to deliver on the first fruits of this partnership. Today we outline the first step: DataStax Enterprise (DSE) on vSAN for development environments, and we provide some insights into what may come.
VMware and DataStax have jointly undertaken an extensive technical validation to demonstrate VMware vSAN™ as a storage platform for globally distributed cloud applications for test and development environments. The companies are working together on shared nothing vSAN enhancements with a design focused on cloud applications that require data to be contextual, always on, real time, distributed, and scalable.
DataStax Enterprise (DSE) is powered by the best distribution of Apache Cassandra™. The new Datastax Enterprise on VMware vSAN for Development—Solution Overview paper is now available on StorageHub
Here is a glimpse of this joint solution.
This joint solution is a showcase of using VMware vSAN as a Hyper-Converged Infrastructure (HCI) for deploying DSE in a vSphere environment:
- We demonstrated the architecture of fast and streamlined deployment.
- We measured the performance of running DSE on vSAN.
- We validated the resiliency and availability against various failures.
We created an 8-node vSAN cluster with a 16-node DSE cluster to validate the deployment of DSE at scale for typical user workload. The solution architecture is shown in Figure 1.
Figure 1. Solution Architecture
To ensure continued data protection and availability of DSE during planned or unplanned down time, a minimum of four nodes are recommended for the vSAN cluster and an all-flash configuration is required for performance consistency and predictable latency.
In our solution validation, we use NVMe as cache tier and configured two disk groups per node. Each disk group has one cache NVMe and four capacity SSDs. vSAN storage policy failure to tolerant (FTT) is set to 1 and software checksum is disabled. The storage policy can be customized for different DSE applications to satisfy performance, resource commitment, failure tolerance, checksum protection, and quality of service requirements.
We configured commitlog_sync in batch mode in the cassandra.yaml file. DSE will not acknowledge writes until the commit log has been fsynced to disk.
The vSAN and DSE testing environment consists of 8 DELL R630 servers for the 16-node DSE cluster. We use an additional 4-node hybrid vSAN cluster for the DSE OpsCenter and 8 testing client nodes that are used to generate load on the cluster. We also configured separate storage cluster on the hybrid cluster to avoid performance impact on the tested DSE cluster.
Table 1 shows the hardware components used in this solution.
Table 1. Hardware Components
2 sockets, 12 cores each of 2.3GHz with hyper-threading enabled
2 x 10Gb NIC
|Storage adapter||SAS Controllers Dell LSI PERC H730 Mini
NVMe Controller Intel Corporation DC P3700 SSD
|Disks||Cache-layer SSD: 2 x 1.8TB Intel SSD DC P3700 NVMe
Capacity-layer SSD: 8 x 800GB 2.5-inch Enterprise Performance SAS SSD S3710
Table 2 lists the operating system and software used in this solution.
Table 2. Software Components
|VMware vCenter Server® and ESXi||6.5.0d
(vSAN 6.6 is included)
|ESXi Cluster to host virtual machines and provide vSAN Cluster. VMware vCenter Server provides a centralized platform for managing VMware vSphere environments.|
|VMware vSAN||6.6||Software-defined storage solution for hyper-converged infrastructure.|
|Ubuntu||14.04||Ubuntu 14.04 is used as the guest operating system of all the virtual machines.|
|DSE||5.1||DataStax Enterprise 5.1.|
|3.10||The Cassandra-stress tool is a Java-based stress testing utility for basic benchmarking and load testing a Cassandra cluster.|
Table 3 lists the configuration details of DSE VMs.
Best practice: Set the type of virtual SCSI controller to paravirtual and make each virtual disk to use a separate controller.
Table 3. VM Configuration
64GB (heap size of 24GB)
|Disks||OS disk: 40GB
Data disk: 1,150GB
Log disk: 50GB
The eight client nodes all run Cassandra-stress, a built-in DSE benchmark tool used for workload testing.
Load Base Data Set
Cassandra-stress tests run on each of the 8 client VMs to parallelize the data generation across the cluster. After the running, check nodetool status to see the amount of data per node. Repeat the above process for several rounds on each stress client until at least 500GB per DSE node is reached.
Performance Result Examples
In the performance testing, Cassandra-stress randomly inserts and reads, running a 90% write and 10% read workload to get a measure of how we are stressing the cluster and to work towards maximizing the peak performance when no maintenance tasks are occurring and then backing off from that to a workload we would advise customers to run in a typical scenario, which reduces threadcount on each client by 30%, that is 140 threads.
The throughputs and latencies of 1-hour and 24-hour performance tests (90% write and 10% read) are consistent, which validates vSAN provides consistent performance at predictable latency.
In 90% write and 10% read performance tests, median read latency is less than 2ms while median write latency is less than 8ms. 95 percentile and 99 percentile latency numbers are reasonable.
Resiliency and Availability
vSAN’s storage-layer resiliency features combined with DSE’s peer-to-peer design enable this solution to meet the data availability requirements of even the most demanding applications. A set of failure scenarios are created to validate data availability.
In our failure testing, we again run Cassandra-stress against the preloaded dataset. vSAN storage policy FTT is set to 1 and the replication factor of DSE keyspace is set to 3.
- Initial testing was conducted using EXT4 and required journaling to be enabled.
- Users must use a rack-aware snitch to ensure that multiple data replicas are not stored on the same ESXi host.
From the perspective of failure, we conducted three types of failure:
- A physical disk failure in a vSAN datastore, which will cause vSAN objects residing on this disk to enter a degraded state. With the storage policy set with FTT=1, the object can still survive and serve I/O. Storage-layer resiliency handles this failure, thus from the DSE VM’s perspective, there is no interruption of service.
- A DSE VM failure in the DSE cluster, which will cause loss of a DSE node, but application resiliency ensures the service is not interrupted and only performance is impacted since the cluster is smaller.
- A physical host failure will power off all the running VMs residing on it. In our validation, the DSE cluster loses two nodes but the service is not interrupted. If enabling VMware vSphere® High Availability when a host fails, vSphere HA will restart the impacted virtual machines on another host.
The failure testing results were summarized in Table 4.
Table 4. Failure Testing Results
|FAILURE TYPE||TEST DESCRIPTION||RESULT|
|Disk failure||Fail one disk, and bring it back after 20 minutes.||Performance impact is negligible, after bringing the disk back, vSAN resyncs the data in less than 2 minutes.|
|VM failure||Fail a VM and bring it back after an hour.||Service is not interrupted, performance is degraded as expected due to the loss of a DSE node.|
|Host failure||Power off a host and bring it back after 20 minutes.||Performance is degraded due to the loss of two DSE nodes as expected, but service is not interrupted since the host is back within one hour. vSAN only resyncs the data without component rebuild.|
Overall, deploying, running, and managing DSE applications on VMware vSAN provides predictable performance and high availability. All storage management moves into a single software stack, thus taking advantage of the security, performance, scalability, operational simplicity, and cost-effectiveness of vSAN.
It is simple to expand using a scale-up or scale-out approach without incurring any downtime, With the joint efforts of VMware and DataStax, customers can deploy DSE clusters on vSAN for their modern cloud applications with ease and confidence in test and development environments. Check back for further developments around the future of this partnership.
To learn more, check out the full solutions brief here.