All-Flash Virtual SAN
Hyperconverged Infrastructure vSAN

Microsoft SQL Server on VMware All-Flash Virtual SAN– Update from the Reference Architecture Trenches

Over the past few months, we have been working on SQL Server 2014 on All-Flash Virtual SAN and I just wanted to peep my head out and give you our customers and partners an update.

First of all, if you aren’t running SQL Server on Virtual SAN yet, let me give you three reasons why you might want to consider it:

  • 50% lower TCO overall by deploying SQL Server on cost effective industry-standard server components which removes large, upfront investments. Further improved TCO with storage efficiency features like deduplication and enhanced automation capabilities.
  • Virtual SAN delivers enterprise availability for the most demanding business critical applications, capable of delivering 99.9999% uptime and beyond with built-in and tunable failure tolerance settings (see the Virtual SAN Delivers Enterprise Level Availability blog for more information).
  • Virtual SAN provides the simplicity of managing storage along with compute and networking in a single, tightly integrated interface–the vSphere Web Client.

In Virtual SAN 6.2, we introduced key space efficiency features such as Deduplication, Compression, and Erasure Coding (RAID5/6). During our testing, one of our goals was to drive OLTP workload and test performance with the new space savings enabled. Using a four-node All Flash cluster with (4) SQL Servers, 2 database sizes were used; (2) 200GB and (2) 500GB. To drive the workload, we created TPC-E like databases by using Dell’s Benchmark Factory for Databases with 20 and 50 scales, which results in 200GB and 500 GB databases, or 20,000/50,000 customer database rows respectively. . During our tests, four virtual machines on a four-node All-Flash SAN cluster can consistently achieve the aggregate TPS (transactions per second) up to 7,965 (Deduplication & Compression and Checksum enabled Virtual SAN), and can achieve predictable virtual disk latency ranging from 1ms to 2ms for read and write on average. That means that with all of the space efficiency features of Virtual SAN 6.2 enabled, Virtual SAN provides great performance with minimal impact.

All-Flash Virtual SAN Specifications & Performance

VMware VSAN Disk Group Specification (per Host)

  • SSD: 2 x 400GB Solid State Drive (Intel SSDSC2BA40) as Cache SSD
  • SSD: 8 x 400GB Solid State Drive (Intel SSDSC2BX40) as Capacity SSD

SQL Server Testing Configuration (per VM)

  • Windows Server 2012 R2
  • Storage Footprint:
    • 200GB database: 815GB allocated, 750GB used
    • 500GB database: 1,800GB allocated, 1,600GB used
  • VM configuration:
    • 200GB database: 24vCPU, 80GB memory
    • 500GB database: 32vCPU, 160GB memory

First we focused on the performance of TPC-E like performance on the Virtual SAN with Deduplication & Compression, and Checksum disabled. We measured performance ranging from 1,905/1,906 TPS on the 200GB databases to 2051/2158 TPS on the 500GB databases. In aggregate, we saw cluster-wide performance measure 7,965 TPS. We measured the average disk read and write latency ranging from 1ms to 2ms.

fig1-1

Then, we enabled with Deduplication & Compression, and Checksum on the Virtual SAN. We measured performance ranging from 1,850/1,851 TPS on the 200GB databases to 2,092/2,172 TPS on the 500GB databases. In aggregate, we saw cluster-wide performance measure 8,022 TPS. We measured the average disk read and write latency ranging from 1ms to 2ms.

fig1-2

In the summarized table, the aggregate TPS for the four test scenarios are ranging from 7,880 to 8,022.

For the SQL Server TPC-E like test, the variable we pay the most attention to is average disk latency. We measured VMware Virtual SAN disk write latency ranging from 1.7ms to 2.1ms for the various scenarios with FTT=1 (under the default Virtual SAN policy). After changing the SPBM policy to Erasure Coding (RAID5), the average virtual disk write latency increased to 4.4ms. The average disk read latency was less than 2ms in all test scenarios.

fig-table_s2

Space Saving by Enabling Deduplication and Compression and EC (RAID5) Policy

We measured the space storage reduction of the structured data (OLTP/TPC-E like database) in the test after putting databases on the All-Flash Virtual SAN with Deduplication and Compression, and Erasure Coding enabled.

Deduplication and Compression are applied on a “per disk group” basis. The results of deduplication will vary for different kinds of data. As for Erasure Coding, before Virtual SAN 6.2, when you deployed a 100GB VM and had FTT defined as 1 you would need to have around 200GB of capacity available on Virtual SAN. With Erasure Coding introduced in Virtual SAN 6.2, the required capacity is significantly lower. You are now able to configure a 3+1(RAID5) or a 4+2 (RAID6) configuration. This means that from a capacity stance, you will need 1.3x the space of a given disk when 3+1 is used or 1.5x the space when 4+2 is used.

To measure the space savings of the real OLTP environment deployment with Deduplication & Compression and Erasure Coding enabled, we deployed five virtual machines in the All-Flash Virtual SAN Cluster, including the two virtual machines with each hosting a 200GB database, two virtual machines with each hosting a 500GB database, and one domain controller.

The provisioned space for the 200GB database virtual machine is 680GB (100GB OS, 2 x 200GB data disks, 1 x 100GB log disk and 1 x 80GB tempdb disk), the provisioned space for the 500GB database virtual machine is 1,360GB (100GB OS, 4 x 250GB data disks, 1 x 100GB log disk and 2 x 80GB tempdb disks), and the provisioned space of the Domain Controller virtual machine is 100GB. Under the default policy of Virtual SAN, the provisioned space was more than 8TB. Virtual SAN calculates the physical written space after deployment, as 5,050GB. After using thin provisioning in the deployment of the five machines, the actual space being used is 2,020GB. The Deduplication and Compression ratio was around 2.27x. We changed the SPBM policy to RAID5 (Erasure Coding) and the space usage was 1,900GB. The corresponding space saving was around 2.66x.

fig1-3

We also compared the Deduplication and Compression ratio of Virtual SAN with the native database compression function. We used one 200GB database to measure the Deduplication and Compression with the native ROW and PAGE compression. When compared to the database native compression methods, the space saving ratio of the data on Virtual SAN is higher than the row-level database compression (40.62%) but lower than the page compression (58.49%). However the two methods can work together. The advantage of the All-Flash Virtual SAN Deduplication and Compression is that the function is storage level and DOES NOT need the interference of the data manipulation.

fig1-4

fig1-5

 

Summary

Virtual SAN is optimized for modern all-flash storage with efficient near-line deduplication and compression, and erasure coding capabilities that lower TCO while delivering incredible performance. Virtual SAN 6.2 is ready for any application with tested and validated deployments of Microsoft SQL Server. This blog is a preview of a comprehensive reference architecture paper that is being published very soon, stay tuned.

Comments

4 comments have been added so far

  1. Great article, good to see some solid numbers. It would be really interesting to see the deduplication results with a two node Always-On Availability Group. Given this SQL cluster replicates the data to a second node for read-only traffic, it’s conceivable you’d see a high dedupe result. The combination of a two node Always-On Cluster, DRS anti-polarity rules, dedup and VSAN FTT=1 seems like a great architecture for HCIA.

  2. Hi Matt,
    Thank you for your comments and valuable suggestion. As for the deduplication and compression result for AAG protected databases, I don’t have the data. However, we will put it to the plan to verify the space saving ratio of AAG enabled databases on All-Flash Virtual SAN with FTT=1. If you have other suggestions please feel free to let me know. Thanks.

  3. Hi Tony,

    I read the recent release of the document… can you explain the relationship between write latency and WAN latency please? Can anything be done to optimise or improve the write latency performance in stretched clusters?

    Thansk

  4. Hi Duncan,

    Thanks a lot for your comments. Please see my suggestions below.

    1. The WAN latency will be added to the write latency if you use stretched cluster with mirror write or FTT=1
    2. There is no special optimization from vSAN side to improve the latency in stretched cluster. If using vSAN6.6, you have a choice to use nested fault domain and use PFTT=0 and SFTT=1/2/3 to avoid the write to remote site. However it is not an option if you want to protect your vm at the site level.
    3. 3rd party vendor like sliverpeak provides WAN optimization.

    Best Regards,
    Tony

Leave a Reply

Your email address will not be published.