TPC-C Benchmark is an on-line transaction processing (OLTP). (TPCC Main site) TPC-C uses a mix of five concurrent transactions of different types and complexity. The database is comprised of nine types of tables with a wide range of record and population sizes. TPC-C is measured as transactions per minute (TPM).
The goal of this exercise was to see if 1 million TPM can be achieved on virtualized SQL 2014 backed by an all Flash storage array for a TPC-C like test. The TPC-C testing would be compared between two VM sizes (Within NUMA & Exceeding NUMA boundaries)
Infrastructure Components:
The following components were used for testing OLTP performance on an all Flash storage array:
- Dell R910 server with 40 (2.4 GHZ )cores and 256 GB RAM
- Pure Storage FA-420 FlashArray with two shelves that included 44 238 GB Flash drives and 8.2 TB usable capacity.
- SQL 2014 Virtual machine with 8 or 20 vCPU, 60 or 200 GB RAM, Four 200 GB Data Drives and two 100 GB Database Log drives & a separate TEMPDB drive.
- SW ISCSI on dedicated 10GBPS ports load balanced across two dedicated 10 GBPS network interfaces with jumbo frames enabled.
- DB Hammer SQL Server TPC-C load generator.
Test Virtual Machine:
The virtual machine used for testing was a Windows 2012 Server R2 with SQL 2014 with all database disks stored on a Flash based array.
Two different virtual machine sizes were used for the testing
(1) 8 vCPU & 60 GB RAM (Within NUMA boundary)
(2) 20 vCPU & 200 GB RAM (vNUMA enabled) used for Monster VM test.
Each of these VMs had the following common attributes relating to disk configurations:
- Boot Disk: C Drive and swap space on shared LUN with other virtual machines
- VM had 4 PVSCSI adapters with disks evenly distributed across them.
- 4 Data Disks across separate LUNS on the Pure Storage and 2 Log Disks across separate LUNS on the Pure Storage.
- A disk for TempDB on a dedicated LUN
Preparing the Data Disk:
As per SQL best practices (here a link to Microsoft’s SQL Server best practices assets: “Microsoft Technet”; and here is VMware’s best practices for virtualizing SQL Server: “VMware for SQL Best Practices” ), the data and the log disks were formatted in Windows with 64K block size. Four SCSI adapters were used in the SQL virtual machine with the disks evenly distributed across all SCSI controllers. Four 110 GB disks on dedicated LUNS hosted on a Pure Storage FlashArray were used for the DB data files. Two 50 GB disks on dedicated LUNS Pure Storage FlashArray were used for the DB log files.
External Test Machine:
The testing of the OLTP performance will be conducted from an external Windows machine with DB Hammer client installed in it. Multiple virtual users running on a Windows 2008 R2 Server with 4 vCPU & 16 GB RAM
Profiling DB Hammer for optimal number of users.:
DB Hammer was initially profiled with pre-testing to find the optimal number of user sessions per instance. It was found that the best results regarding transactions per minute (TPM) was achieved on a four user configuration for a single DB Hammer session.
SQL Server Optimizations:
SQL server best practices relating to using multiple SCSI adapters with virtual disks distributes across all adapters were implemented. Four Data disks and two log disks each on individual LUNS were configured on the Pure Storage Array and provisioned to the SQL virtual machine. A Separate LUN and corresponding disk was provisioned for TEMPDB space for optimized performance. There were multiple tempdb files created to optimize performance.
DB Hammer Setup:
DB Hammer was setup for SQL TPC-C testing. The number of warehouses for the TPC-C Testing was set to 20 and the corresponding schemas and databases were created. The database data files were spread across four data disks E,F,G,H and the database log files were distributed across the two dedicated disks I and J.
Phase 1: Testing for virtual machine within NUMA boundary: (8 vCPU, 60 GB RAM)
The first set of tests were performed by sizing the SQL server with 8 vCPU and 60 GB RAM. As the server has 10 cores and 64GB RAM per socket, this configuration fits within NUMA boundaries.
Testing:
DB Hammer workloads were generated in parallel across 12 individual sessions of 4 users each from the client test machine. The transactions per second were monitored along with SQL server metrics. It was observed that as the number of users and the transactions increased, the CPU utilization of the SQL server increased. The ramp up in the users was stopped when the CPU utilization of the SQL server reached 90%. The
The maximum transactions per minute for this configuration was approximately 650000 as shown by the scale shown below:
Phase 2: Testing for virtual machine crossing NUMA boundary: (20 vCPU, 200 GB RAM)
The next set of tests were performed by sizing the SQL server with 20 vCPU and 200 GB RAM. As the server has 10 cores and 64GB RAM per socket, this configuration extends beyond NUMA boundaries. vNUMA was enabled and settings adjusted for the 10 core processor boundary.
Testing:
DB Hammer workloads were generated in parallel across 20 individual sessions of 4 users each from the client test machine. The transactions per second were monitored along with SQL server metrics. It was observed that as the number of users and the transactions increased, the CPU utilization of the SQL server increased. The ramp up in the users was stopped when the CPU utilization of the SQL server reached 90%. During the testing it was observed that IO workload was in the order 30-40 MBPS. The All FLASH array which can do hundreds of MBPS was therefore never burdened during the tests.
The maximum transactions per minute for this configuration was approximately 1.1 million TPM as shown below:
The test was then repeated once again to confirm the results from the test. More than 1 million TPM was achieved in the second test as well as seen in the maximum value of the scale. Remember to look at the vertical scale rather than the actual number displayed.
Conclusion:
The results show that vSphere is an excellent platform for SQL 2014 OLTP workloads. Running TPC-C like workloads with DB Hammer and achieving more than 1 million transactions per minute validates virtualized SQL for high performance transactional workloads. The results were repeated with a follow on test that showed similar TPM. The Pure Storage All Flash storage array, which is a critical component in the infrastructure performed with sub millisecond latencies throughout the test. The nature of OLTP testing with TPC-C on Microsoft SQL 2014 also showed that the throughput demands on the storage was low.