Architecture

Virtual Volumes: A game changer for operations of virtualized business critical databases

This is first of a series of posts on deploying vSphere Virtual Volumes for Tier 1 Business Critical Databases. Although this article is written with a focus on Oracle databases, much of this discussion holds good for any Mission critical application.
Business critical databases are among the last workloads virtualized in enterprises, primarily because of the challenges that they pose with growth and scale. Typically the low hanging fruits are virtualizing the Development, Testing/QA, Staging databases after running a successful POC and then moving on the big guy’s i.e. the Production databases.

There are many common concerns about virtualizing business critical databases that inhibit and delay virtualization of these workloads:
• Business critical virtualized databases need to meet strict SLAs for performance and storage has traditionally been the slowest component
• Databases grow quickly, while at the same time there is a need to reduce backup windows and their impact on system performance.
• There is a regular need to clone and refresh databases from production to QA and other environments. However, the size of the modern databases make it harder to clone and refresh data from production to other environments
• Databases of different levels of criticality need different storage performance characteristics and capabilities.
• There is a never-ending debate between DBAs and Systems administrators regarding filesystems VS raw devices and VMFS VS RDM. These are primarily due to some of the deficiencies that existed in the past with virtualization.
Levels of database operations on VMware environments

Generally speaking there are 3 levels from which regular database operations (i.e. backup, cloning, etc.) can be triggered: application level, vSphere level, and storage level.

Furthermore, each approach has benefits but also drawbacks. For instance, application level operations (Oracle RMAN, SQL) may provide finer operation granularity but performance is not optimal. vSphere level operations offer VM granularity but a VM level snapshot will stun a VM for some time during snapshot coalescing/deletion (KB 1002836: A snapshot removal can stop a virtual machine for long time). Finally, storage level operations offer better performance but lack VM granularity as operations are executed at LUN level.

The ideal solution to address database operation challenges

An ideal solution would combine the built-in storage capabilities with the granularity of VM-level operations, like snapshots. More specifically:
• The solution should be able to trigger backups and clones with VMDK granularity at the same time.
• Do a storage level snapshot triggering the operation at the VM level, which is the fastest and the ideal among all the three above solutions.
• The solution would allow different database components to be aligned with different storage data services needed.