We’re really excited about the buzz around Oracle in virtualized environments. One of the best kept secrets is just how well Oracle performs on VMware ESX. This didn’t happen by accident – there are a number of features and performance optimizations in the VMware ESX server architecture, specifically for databases.
In this blog, I’ll walk through the top ten most important features for getting the best database performance. Here are a few of the performance highlights:
- Near Native Performance: Oracle databases run at performance similar to that of a physical system
- Extreme Database I/O Scalability: VMware ESX Server’s thin hypervisor layer can drive over 63,000 database I/Os per second (fifty times the requirement of a typical database)
- Multi-core Scaling: Scale up using SMP virtual machines and multiple database instances
- Large Memory : Scalable memory – 64GB per database, 256GB per host
We’ve continued to invest a great deal of work towards optimizing Oracle performance on VMware, because it’s already one of the most commonly virtualized applications. The imminent ESX 3.5 release is our best database platform to date, with several new advanced optimizations.
In this blog article we’d like to explain the unique and demanding nature of database applications such as Oracle produces and show the performance capabilities of ESX Server on this type of workload.
The Nature of Databases
Databases have some unique properties, such as a-large memory footprint. At the outset this can make them slightly more complex to virtualize well. However this has proven to be an opportunity, since we can optimize specifically for these defining properties.
- Large Memory: Databases use large amounts of memory to cache their storage. A large cache is one of the most important performance criteria for databases, since it can often reduce physical I/O by 10-100 fold.
- High Performance Block I/O: Databases read and write their data in fixed, block sized chunks. The I/Os are typically small, and operate at a very high rate on a small number of files or devices.
- Throughput Oriented: Databases often have a large number of concurrent users, giving them natural parallelism and makes them ideally suited to take advantage of systems with multiple logical or physical processors.
Understanding and Quantifying Virtual Performance
The performance of a virtualized system should first be quantified in terms of latency and throughput, and then in terms of how efficiently resources are being used. For example, if a physical system is delivering 10000 transactions per minute at 500ms latency per transaction, then a virtualized system that is performing at 100% of native should provide the same level of throughput with acceptable latency characteristics. Secondary should be a metric of resource usage, which is a measure of how many additional physical resources were used to achieve the same level of performance. It’s sometimes overly easy to focus primarily on the CPU resource, when in reality memory and I/O are much more expensive resources to provision. This is becoming especially important going forward, as multicore CPUs continue to lower the cost per processor core, while memory cost remains at a premium.
More important for Oracle is the ability to scale up by taking advantage of multi-core CPUs, large memories, and the I/O throughput through the hypervisor to support the large number of disk spindles in the backend storage arrays.
Database Performance Myths
There are a few common myths about virtualizing databases:
- Databases have a high overhead when virtualized: Virtualized Databases can perform at or near the speed of physical systems, in terms of latency and throughput. The virtualization overhead for typical real-world databases is minimal – for VMware ESX Server, we measured CPU overhead to be less than 10%.
- Databases have too much I/O to be virtualized: Databases typically have a large number of small random I/Os, and it is in theory possible to hit a scaling ceiling in the hypervisor layer. VMware ESX’s thin hypervisor layer can drive over 63,000 database I/Os per second, which is equivalent to more than 600 disk spindles of I/O throughput. This is sufficient I/O scaling for even the largest databases on x86 systems.
- Virtualization should only be used for smaller, non-critical applications: The ESX hypervisor is very robust: many customers are seeing over two years of uptime from ESX based systems. In addition, the ESX hypervisor remains stable, even if resources are overcomitted.
There isn’t one quick hit to make databases work well for a wide range of real-world applications – good performance is something that is earned from the long term discipline of focusing the lessons learned from many customer-oriented real-world database workloads, and applying those lessons across the architecture of the hypervisor.
Let’s take a quick walk through the specific features that you should look for in the hypervisor for good database performance.
1: High Performance I/O in VMware ESX
Throughput and latency of the I/O system are critical to performance of online transaction processing systems. Since transaction database systems operate on small data items at random places in the dataset, it’s important that we measure random I/O throughput (measured in I/O operations per second), rather than bandwidth (MB/s).
Figure.1 VMware ESX I/O Driver Model
Since the hypervisor logically resides between the database in the guest virtual machine and the backend storage, it is critical that the hypervisor’s I/O facilities scale up without any performance ceilings, and don’t add any appreciable latency. The I/O subsystem in VMware ESX shown in Figure.1 uses a direct driver model, so that there is minimal latency added by the virtualization stack. This is possible because I/O requests can be handled in-line by the same processor as the requesting virtual machine (other architectures add substantial latency and CPU overhead when I/O is proxied via a heavy-weight domain-0 or parent-partition).
Oracle databases typically issue many small 4Kbyte or 8Kbyte sized I/Os in a random access pattern. For these I/O’s, a single typical disk can deliver somewhere in the order of 100-200 I/Os per second, depending on the rotational speed of the disk, though in practice, it’s best not to push the drives beyond 100 IOPS each. The throughput of the VMware ESX 3.5 hypervisor has been increased significantly, and shows that more than 60,000 I/Os per second can be sustained – the throughput of over 600 disks.
Results from an VMware study of its customers, showed that across 15,000 Oracle servers the average number of I/Os per second for a loaded 4 processor system is 1280, which is approximately the throughput of 15 disks. Since some workloads are more demanding than others, and some are bursty in nature, it’s important to have substantially more headroom. The throughput capability of the ESX’s I/O subsystem is sufficient for more than even the most demanding database.
2: Scale Up using Virtual SMP
VMware ESX can take advantage of systems with multiple physical
processors in two ways, by scaling out through multiple virtual
machines, and by scaling up each virtual machine to use more than one
physical processor. VMware ESX provides a Virtual SMP capability,
allowing up to four processors in each guest virtual machine, and up to
64 processors in the physical system.
Figure.3 – Virtual SMP
Since database workloads typically have a large number of concurrent
users, they are explicitly parallel and can easily process more than
one task at a time.
Oracle is able to take advantage of VMware’s Virtual SMP, so
performance can be scaled beyond a single processor for each virtual
machine. To demonstrate this, we ran several benchmarks with Oracle
database 10g Release 2, using the popular SwingBench on-line transaction processing
workload. Figure.5 shows the throughput of Oracle with an
increasing number of processors in a single virtual machine. The
benchmark measures transaction throughput, and shows 94% scaling as
additional processors are added. Incidentally, this is exactly the same as
the scaling we see on native, which is likely due to hardware and
database scaling artifacts.
One of the key requirements for consolidation is good scalability with a large number of database instances. To show this, we ran multiple SMP instances of Oracle 10G on VMware ESX Server 3.5. Figure.3 shows the scaling of the VMware ESX platform when running
the open source DVDstore database benchmark. The benchmark is run using client-server mode, so that we can focus on the database tier . In this study, we scaled the benchmark using one through seven dual processor SMP Linux virtual machines, each with its own database instance. We’ll be posting further details of this benchmark on Vroom soon.
3: Scale up with Large Memory
Oracle databases love memory. The primary use is for caching pre-compiled SQL queries and caching blocks from disk in memory.
Database designers go to significant effort to avoid doing disk I/O when possible. This is because the latency of a disk I/O is substantially higher than the time a transaction will spend on the CPU. For example, a disk I/O takes on the order of 10 milliseconds, while the typical transaction takes just a few milliseconds of CPU time. If an I/O to disk can be cached in memory, it then could be serviced in a fraction of a millisecond. In addition, since disks are expensive, the cost of the storage systems for databases is often more affected by the I/Os per second that it can deliver, than by pure storage space. Lowering overall disk throughput can mean significantly lowering the cost of the system.
Larger memory sizes help Oracle by caching more disk blocks in memory. Consider this simple example: if a database system is using memory to cache it’s disk I/O, and is yielding a 90% cache hit rate, this means that one in every ten accesses causes a physical I/O. For 10,000 accesses a second, we would see 1,000 I/O’s per second. If we increase memory to improve the cache hit rate to 99%, then we reduce the I/Os to one in one-hundred, reducing the physical I/O by 10x to only 100 I/Os per second.
Often, over 80% of the memory used by the guest operating system is used by the Oracle disk block cache. A general rule of thumb is that the database cache be sized at 5-10% of the database size, and that doubling memory improves throughput by about 20%. This is obviously very workload dependent, but you can see that larger memory sizes help improve resource efficiency in other areas of the system, and that generally, more is better. For these reasons, large memory is very important for databases. VMware ESX 3.0 allowed 16GB of RAM per guest, and 3.5 increases the capability to 64GB per guest.
Due to the inherent gains in processor utilization through consolidation of workloads, we can squeeze more workloads onto a single system. This means that the average memory requirement per physical processor is on average twice that of a traditional unvirtualized system. To accommodate these growing requirements, we’ve pushed the memory scalability curve considerably in ESX 3.5, and now supports up to 256 Gigabytes of RAM on the new high-end systems from Sun, IBM and Unisys.
4: Large Pages in ESX 3.5 Hypervisor
Oracle databases have used large pages in the CPU’s MMU to optimize memory performance for some time. This facility is used with the operating system’s large page feature, typically for the large shared memory segments that hold the database’s disk block cache. Large pages are supported on Linux, Windows and Solaris guests. Oracle typically yields a 5-20% performance advantage with large pages, depending on the type of processor and the size of the configured memory.
Other x86 hypervisors don’t provide virtual large page capability, so this optimization is lost when the database is virtualized. The ESX 3.5 hypervisor provides advanced large-page support which allows the database to properly exploit the CPU’s large page capability.
5: ESX Optimization for NUMA systems
Many of the interesting new hardware systems today are implemented using non-uniform memory architectures. This means that not all memory is of uniform speed – accessing memory that is closer topologically to the processor is faster than memory that is further away.
To ensure optimal performance, the VMware ESX hypervisor allocates memory for the guest operating systems from physical memory near the CPU on which the guest resides on.
6: High-performance Paravirtualized Networking
Paravirtualization is a term used to describe when the guest operating system has some knowledge of the hypervisor, and can leverage this knowledge to optimize it’s execution in concert with the hypervisor. The VMware ESX hypervisor uses paravirtualized networking drivers in the guest operating system to provide high performance networking. These drivers are installed automatically through the VMware tools package at the time the guest is first powered on. Unlike CPU paravirtualization, paravirtualized drivers do not require any changes to the guest operating system – they are simply installed as transparent new drivers.
VMware ESX can drive gigabit Ethernet at line rate, as demonstrated in the paper Networking Performance in Multiple Virtual Machines. The networking performance of ESX 3.5 has further improved by incorporating new stateless offload features, such as large-segment-offload (LSO) and jumbo frames — and now achieves near line-rate (9.9gbits/s) on 10Gbit Ethernet.
The performance of networking can be increased beyond that of one NIC by scaling across multiple NICs. Figure.6 shows the scaling of gigabit Ethernet performance as multiple NICs are added.
7. Use VMware ESX’s Page-Sharing to use less memory
The ESX hypervisor can safely share physical memory that has the same contents through a facility known as transparent page sharing. Through page sharing, the hypervisor arranges for a single physical page of memory to back multiple pages in the guest, so that just one copy of the data need reside in memory. Using this technique, the total amount of memory consumed is less than the sum of the parts. The hypervisor ensures full security isolation — if one guest modifies a page, then it get its own private copy.
This facility can be used effectively to save memory with Oracle in several ways. When there are multiple instances of a database running, the page sharing facility will automatically share the code portions of the operating system and the Oracle instance. This often results in saving in the order of a few hundred megabytes of memory per virtual machine.
When multiple databases are sharing similar data – for example, a shared reference table or multiple copies of the database for development purposes, ESX can automatically detect the duplicate disk blocks in the Oracle disk block cache, and arrange to share those. Thus, the database cache memory image can be transparently shared across database instances, and across virtual machines. This can result in a further saving in the order of tens of megabytes (at least the system tables will be the same), through several gigabytes, depending on the amount of common disk data between the instances.
As an additional benefit, some memory can be shared within each instance. This is often for zero pages.
8: Paravirtualized CPU
Figure.7 – Virtualization Techniques
There are various techniques used to virtualize the x86 instruction set, including binary translation, paravirtualization and hardware assist. Binary translation has long been used by the VMware hypervisor to provide near native performance for virtualization for many workloads. CPU Paravirtualization or hardware assist are two approaches that can be used to provide small optimizations for workloads with many system calls as well as providing certain memory optimizations. No single approach is best for all workloads, and in VMware ESX, different approaches are used for different workloads. Ole Agesen and Keith Adams help explain the different technologies in their paper about the performance of virtualization.
VMware ESX can optionally use paravirtualization for some guest operating system types. In a recent study, the performance of Oracle 10g R2 using the Swingbench online transaction processing workload on a paravirtualized Linux guest shows a moderate gain of 10% when using paravirtualized CPU interfaces.
9: The Best Oracle-Windows Performance
Since all of the key CPU, memory and I/O virtualization capabilities are in a portable layer of the hypervisor, the performance of Oracle on Windows guests is equivalent to that of Linux guests. Oracle on Windows is able to take advantage of large-pages, SMP, and I/O scalability as well as our high performance paravirtualized networking drivers.
For Linux, Solaris and Unix administrators, this means that you have the freedom to choose the OS which has the best tools to facilitate your deployment. For Windows administrators, it means that you can confidently run your Oracle databases on Windows, with the same levels of performance and scalability.
10: Universal 32-bit and 64-bit Guest Support
To take advantage of more than 3.5Gbytes of memory in a guest, databases need to be configured as 64-bit applications, and use a 64-bit capable operating system. VMware ESX allows mixed 32-bit and 64-bit guests concurrently, thu simplifying the deployment of 64-bit guests when needed.
Summary
To all the database administrators out there, watch for more VROOM posts about VMware performance with Oracle, the new Oracle portal, which will contain plenty of good resources for virtualization of Oracle. Also, there is a new Oracle Discussion Forum – feel free to discuss Oracle performance over at the forum too. Virtual database performance has never been so good!