Unix to Virtualized Linux (U2VL) is a critical step towards SDDC, it targets to migrate applications and data from physical Unix servers to Linux virtual machines running on x86 virtualized infrastructure. These applications are typically business critical, therefore, customers normally take a very cautious approach by doing a carefully planned and executed Proof-of-Concept (POC) in order to validate performance, availability, and scalability, among many other areas.
My colleagues in China (a big shout out to Tony Wang and his team!) recently did one such POC with a large local bank, and naturally they chose Virtual SAN hyper-converged architecture for all of the compute and storage needs. The test results were so illustrative of many of the Virtual SAN benefits, I’d like to share this POC and some of the test results here, although I’m not allowed to mention the customer name due to reasons you probably understand.
The business challenge
This customer has been growing rapidly, as a result there is continuous investment in its IT infrastructure as well as the accompanying requirements to ensure IT can support the growing business needs.
One of the challenges they are facing is the end-of-day batch processing of their Oracle CRM database, currently taking upwards of 10 hours daily. This is only going to get worse and worse as they acquire more customers. And the customer’s target? A mere 3 hours to finish the job.
The existing environment
Like many financial institutions in China, the bank is still a Unix shop for their Oracle database and applications. The production CRM database runs on IBM Power 710 Express server, with storage on an EMC VNX 5300 array. Sounds familiar?
The database is Oracle 10g on AIX. There are over 5.2 billion records in the 600GB database, with a 50GB incremental data daily for processing. As mentioned, it takes 10+ hours to finish a job in this environment.
In order to improve things, the customer brought in an Oracle Exadata Database Machine X5-2 for testing. Granted, the system was configured to one fourth of the max capacity, but even at that configuration it’s a very powerful setup specifically designed for Oracle database. The processing time improved to around 5 hours, however still a long way to go from the 3-hour target.
Virtual SAN POC configuration
The customer was very intrigued by what they had heard about Virtual SAN, not just its highflying performance but also many other benefits in terms of CAPEX/OPEX savings, resiliency, scalability, and availability. So a Virtual SAN POC was underway.
The 3-node cluster was assembled and set up quickly using a Lenovo Virtual SAN Ready Node, rest assured that the server and storage components are certified. Each server has the following configuration:
- 2x Intel Xeon E5-2650 v3 2.3GHz
- 16x 16GB TruDDR4 2133MHz LP RDIMM
- 2x S3700 400GB SATA 2.5” MLC Enterprise SSD
- 8x 1.2TB 10K 6Gbps SAS 2.5” HDD
- 2x ServeRAID M5210 SAS/SATA controller
- 1x Emulex VFA5 ML2 Dual Port 10GbE SFP+ adapter
Huawei switches were used to provide network infrastructure. vSphere 6.0U1 and Virtual SAN 6.1 were used for the POC.
Test methodology
An Oracle 12c RAC database was created across three Oracle Linux Server 6.6 virtual machines, each residing on one ESXi host. The RAC database was created in the following way:
- CRS: 20GB x 3
- DATA: 500GB x 8
- REDO: 60GB x 3
expdp was used to export the production database, and then impdp used to import it into the newly created RAC database on Virtual SAN. Finally the same batch processing job consisting of data extraction, data loading, data cleansing, main procedure, and reporting procedure was run.
Test results
The Virtual SAN test results were stunning. With a modestly configured hybrid test bed, the same batch processing job was able to finish in 1 hour 48 minutes!! The following chart shows a detailed breakdown of the test results.
Now that Virtual SAN hit it out of the park in performance, the customer moved on to do additional tests. Some highlights included:
- Live migration of RAC node between ESXi hosts: completed in 16 seconds without down time
- RAC node restart and database recovery on another host with a simulated host failure: 125 seconds to restart the node and 116 seconds to auto recover the database
- Virtual SAN resiliency with simulated disk failures: database up and running all the time
- Manageability: integration with vRealize Operations to provide systematic admin, monitoring, and root cause analysis, in addition to the Health Check tool
And more …
Well, things will get even more interesting when you start looking at the costs of the various solutions. Based on the hardware configurations, we pulled public list price information and did a rough TCO analysis. The following chart says it all:
Final thoughts
A modest 3-node Virtual SAN hybrid cluster, at a fraction of the cost of other solutions, can deliver much better performance. In this customer’s case, we were able to improve the daily batch processing time from the current 10+ hours to under 2 hours. Along the way we paved path for the bank to transform their IT infrastructure into SDDC based architecture.
Virtual SAN is also an excellent solution for the Oracle OLTP workload, as blogged in Oracle On VSAN 6 and Oracle Real Application Clusters On Virtual SAN. There is additional work being done with Oracle on Virtual SAN, particularly with the upcoming VSAN release and all-flash configuration. More to come!