Home > Blogs > VMware VROOM! Blog > Author Archives: Chethan Kumar

Author Archives: Chethan Kumar

Storage vMotion of a Virualized SQL Server Database

vSphere Storage vMotion (svMotion) enables the live migration of disk files belonging to virtual machines (VMs). svMotion helps to eliminate the down time of the applications running in VMs when the virtual disk  files containing the applications’ data have to be moved between storage devices for the purpose of hardware maintenance, upgrades, load-balancing storage resources, or proactive disaster recovery.

svMotion is the missing piece in liberating VMs and VMs’ associated files completely from the physical hardware on which they reside. Because of the importance of svMotion in the virtual landscape, we at VMware Performance Engineering Labs conducted a study involving the svMotion of the virtual disk files of a VM hosting a large SQL Server database. The focus of the study was to understand:

  • The impact on performance of the SQL Server database when migrating physical files of different database components such as data, index, and log.
  • The effect of the I/O characteristics of the database components on the migration time of the virtual disk containing the files of those components.

The results from the study show:

  • A consistent and predictable disk migration time that was largely influenced by the capabilities of the source and the destination storage hardware.
  • That the I/O characteristics of the database components do influence disk migration time.
  • A 5% to 22% increase, depending on the VM load conditions, in the CPU cost of a transaction of the database workload while migrating a virtual disk containing the physical files of the database.

For more details, refer to the white paper “Storage vMotion of a Virtualized SQL Server Database

Single vSphere Host, a Million I/O Operations per Second

One of the essential requirements for a platform supporting enterprise datacenters is the capability to support the extreme I/O demands of applications running in those datacenters. A previous study has shown that vSphere can easily handle demands for high I/O operations per second. Experiments discussed in a recently published paper strengthen this assertion further by demonstrating that a vSphere 5 virtual platform can easily satisfy an extremely high level of I/O demand that originates from the hosted applications.

Results obtained from performance testing done at EMC lab show that:

  • A single vSphere 5 host is capable of supporting a million+ I/O operations per second.
  • 300,000 I/O operations per second can be achieved from a single virtual machine.
  • I/O throughput (bandwidth consumption) scales almost linearly as the request size of an I/O operation increases.
  • I/O operations on vSphere 5 systems with Paravirtual SCSI (PVSCSI) controllers use less CPU cycles than those with LSI Logic SAS virtual SCSI controllers.

For more details, refer to the paper Achieving a Million I/O Operations per Second from a Single VMware vSphere 5 Host.

Performance Implications of Storage I/O Control-Enabled NFS Datastores

Storage I/O Control (SIOC) allows administrators to control the amount of access virtual machines have to the I/O queues on a shared datastore. With this feature, administrators can ensure that a virtual machine running a business-critical application has a higher priority to access the I/O queue than that of other virtual machines sharing the same datastore. In vSphere 4.1, SIOC was supported on VMFS-based datastores that used SAN with iSCSI and Fibre Channel. In vSphere 5, SIOC support has been extended to NFS-based datastores.

Recent tests conducted at VMware Performance Engineering lab studied the following aspects of SIOC:

  • The performance impact of SIOC: A fine-grained access management of the I/O queues resulted in a 10% improvement in the response time of the workload used for the tests.
  • SIOC’s ability to isolate the performance of applications with a smaller request size: Some applications like Web and media servers use I/O patterns with a large request size (for example, 32K). But some other applications like OLTP databases request smaller I/Os ≤8K. Test findings show that SIOC helped an OLTP database workload to achieve higher performance when sharing the underlying datastore with a workload that used large-sized I/O requests.
  • The intelligent prioritization of I/O resources: SIOC monitors virtual machines’ usage of the I/O queue at the host and dynamically redistributes any unutilized queue slots to those virtual machines that need them. Tests show that this process happens consistently and reliably.

For the full paper, see Performance Implications of Storage I/O Control–Enabled NFS Datastores in VMware vSphere 5

 

Troubleshooting Performance Related Problems in vSphere 4.1 Environments

The hugely popular Performance Troubleshooting for VMware vSphere 4 guide is now updated for vSphere 4.1 . This document provides step-by-step approaches for troubleshooting most common performance problems in vSphere-based virtual environments. The steps discussed in the document use performance data and charts readily available in the vSphere Client and esxtop to aid the troubleshooting flows. Each performance troubleshooting flow has two parts:

  1. How to identify the problem using specific performance counters.
  2. Possible causes of the problem and solutions to solve it.

New sections that were added to the document include troubleshooting performance problems in resource pools on standalone hosts and DRS clusters, additional troubleshooting steps for environments experiencing memory pressure (hosts with compressed and swapped memory), high CPU ready time in hosts that are not CPU saturated, environments sharing resources such as storage and network, and environments using snapshots.

The Troubleshooting guide can be found here. Readers are encouraged to provide their feedback and comments in the performance community site at this link

 

Virtualizing SQL Server-based vCenter database – Performance Study

vSphere is an industry-leading virtualization platform that enables customers to build private clouds for running enterprise applications such as SQL server databases. Customers can expect near-native performance from their virtualized SQL databases when running in a vSphere environment. VMware vCenter Server, the management component of vSphere, uses a database to store and organize information related to vSphere-based virtual environments. This database can be implemented using SQL server. Based on the previous VMware performance studies involving SQL databases, it is reasonable to expect the performance of a virtualized SQL Server-based vCenter database to be similar to that in native.

A study was conducted in the VMware performance engineering lab to validate the assumption. The results of the study show that:

  • The most resource-intensive operations of a virtualized SQL Server-based vCenter database perform at a level comparable to that in native environment.
  • A SQL Server-based vCenter database managing a vSphere virtual environment of any scale can be virtualized on vSphere.
  • SQL databases, in general, perform at a near-native level when virtualized on vSphere 4.1.

Complete details of the experiments and their results can be found in this technical document.

For comments or questions on this article, please join me at voiceforvirtual.com.

Performance Implications of Storage I/O Control in vSphere Environments with Shared Storage

vSphere based virtualized datacenters often employ a shared
storage infrastructure to support clusters of vSphere hosts. Applications
running in virtual machines (VM) on vSphere hosts share the storage resources
for their I/O needs. Performance of applications can be impacted when VMs contend
for storage resources that are shared. Without proper access control for sharing the resources,
the performance of all applications tend to get affected in a non-trivial way.
Storage I/O Control (SIOC), a new feature offered in VMware vSphere 4.1,
provides a dynamic control mechanism for proportional allocation of shared storage
resources to VMs running on multiple hosts. The experiments conducted in VMware
performance labs show that:

  • SIOC prioritizes VMs’ access to shared I/O
    resources based on disk shares assigned to them. 
  •  If the VMs do not fully utilize their portion of
    the allocated I/O resources on a shared datastore, SIOC redistributes the
    unutilized resources to those VMs that need them in proportion to the VMs’ disk
    shares.
  • SIOC minimizes the fluctuations in performance
    of a critical workload during periods of I/O congestion. 
    For the test case executed at VMware labs, limiting the fluctuations to a small range resulted in
    as much as a 26% performance benefit compared to that with the default configuration (figure 1).

Figure 1. Application throughput with and
without SIOC enabled

Sioc-adv
 

For further details, read the white paper titled “Managing
Performance Variance of Applications Using Storage I/O Control” at http://www.vmware.com/resources/techresources/10120

World record TPC-H performance on VMware vSphere 4

In what has become a regular phenomenon, VMware's vSphere 4
has again proved that it can deliver the highest levels of performance for the
most demanding applications. In a recent benchmarking effort by ParAccel, Inc.,
provider of ParAccel Analytic Database (PADB), vSphere 4 delivered the industry's
first fully-audited TPC-H benchmark result in an x86 based virtual environment.
A cluster of 80 virtual machines running PADB, spread across 40 physical hosts,
broke the existing world records for performance[1] and
price/performance[2].

Some of the key highlights of this record performance are:

  • Ran 1,316,882 Composite Queries per Hour (QphH) @ 1,000GB, 13% higher than the next best
    performance
    .
  • Used 40 HP servers, 37%
    fewer than that used for the second best performance result.
  • Achieved a price/performance of US $0.70 per QphH, 7.7 times better than that of the previous
    performance record holder.
  • Took 16 minutes and 23 seconds to load 1 TB database at a rate
    of 3.7 TBs per hour, a load time 8.7
    times  faster than that of the
    previous performance record holder
    .

This remarkable feat shows that vSphere enables efficient
utilization of hardware resources through consolidation and achieves this
without compromising on performance. Organizations can confidently virtualize the
world’s most demanding applications and still realize significant cost and
energy savings.

The setup used for this record feat was,

Servers:
40 x HP DL 380 G6 Server, each with


  • 2 Intel Xeon x5560, 2.8GHz processors ( 4 cores
    per socket)
  •  72 GB Memory
  • 8 x 300 GB (10K RPM) internal SAS Disks
  • 2 x HP PCIe Dual Port Gigabit network adapters

Hypervisor:
VMware, ESXi 4.0 update 1

Virtual Machines:
Leader Node:

1 virtual machine with

  • 4
    vCPU
  • 32 GB memory
  • 4
    x 279 GB virtual disk
  • 5
    x virtual NICs
  • ParAccel
    Standard Linux, 64-bit 
  • ParAccel
    Analytics Database, ver 2.5

Compute Node:
79 virtual machines with

  •  4 vCPU
  •  32 GB memory
  •  4 x 279 GB virtual disk
  •  4 x virtual NICs
  • ParAccel Standard Linux, 64-bit
  • ParAccel Analytics Database, ver 2.5

For further details, read the full disclosure report here: http://www.tpc.org/tpch/results/tpch_perf_results.asp


[1] As of  April 13, 2010: HP BladeSystem c-Class 128P
RAC with Oracle Database 11g Release 2 Enterprise Edt., 1,166,976 QphH@1000GB,
5.42USD per QphH@1000GB, available on September 01, 2009.

[2] As of  April 13, 2010: PRIMERGY RX300 S4 with EXASOL
EXASolution 2.1, 1,018,321 QphH@1000GB, 1.18 USD per QphH@1000GB, available on
August 1, 2008

Application Performance Improvement with DRS

Summary

VMware Distributed Resource Scheduler (DRS), a unique
feature of VMware vSphere 4, dynamically allocates and balances computing
resources in a cluster of vSphere hosts. DRS continuously monitors utilization
across resource pools and intelligently allocates available resources among
virtual machines based on policies specified by administrators.  DRS provides not only most efficient resource
management but can also provide significant VM performance gains. In the
experiments conducted at EMC labs, we observed VM performance improvements as
high as 47% with DRS when running virtualized SQL databases on a cluster of vSphere
hosts.


Why DRS?

VMware vSphere provides a virtual platform to consolidate
many servers onto fewer physical hosts. However, in such consolidation
scenarios, unexpected spikes in the resource demands of the VMs can cause the total
resource requirements to exceed the available resources on their host. A manual
approach to mitigate this problem is to estimate the individual as well as
aggregate resource demands upfront and place the VMs intelligently on the hosts
based on the estimation.

But, even if the hosts are balanced initially, there is no
guarantee that the resource demands of the VMs will stay constant and system
loads on ESX will remain balanced. A change in workload behavior may cause
resource demand to change, which can lead to contention among VMs for CPU
cycles on some hosts while CPU cycles remain unused on other lightly loaded hosts.

DRS provides an automated mechanism to manage the resource
demands. It monitors the resource needs of the VMs at the time of their power
on and allocates the resources by placing the VMs on the appropriate hosts in the
cluster. If the resource demands change after the initial placement, DRS
automatically relocates the VMs to hosts where the resources are readily
available. VMs will continue to get the resources their workloads demand and
thus deliver the same performance they would have if they were running on dedicated
hardware.

Methodology

We created a DRS cluster consisting of 4 ESX hosts. All hosts
were identical in hardware configuration (refer to "Configuration Details" for
more information). On each host we created 4 VMs, and in each VM we installed SQL
Server 2005 and DVD Store version 2.0 (DS2) database.

We created 2 DS2 workload profiles:

  • Heavy:
    This profile drove vCPU utilization in a VM to 70%
  • Light:
    This profile drove vCPU utilization in a VM to 10%

We randomly assigned these profiles to the VMs as shown in
Table 1.Though the number of VMs was the same on all hosts, differences in the
application load led to CPU resource contention in some hosts and unused CPU resources
on the remaining hosts.

Table. 1 VM Workload Profiles

Host 1 Host 2
Host 3
Host 4
4H 4H 4L 4L
4H 3H / 1L 1H / 3L 4L
4H 2H / 2L 2H / 2L 4L
4H 2H / 2L 1H / 3L 1H / 3L
3H / 1L 3H / 1L 2H / 2L 4L
3H / 1L 3H / 1L 1H / 3L 1H / 3L
3H / 1L 2H / 2L 2H / 2L 1H / 3L

H – VM with 70% CPU utilization; L – VM with 10% CPU utilization

For each test case in Table 1, we ran a DS2 workload simultaneously in all VMs with DRS disabled. We collected the application throughput (Orders per Minute or OPM) in all the VMs

We repeated the experiments after enabling DRS. During each test case, DRS migrated a few VMs based on the resource demands across the hosts. The final balanced configuration achieved in each case was same and is given in Table 2. We measured the aggregate throughput from all the VMs in this balanced configuration.

Table 2. Balanced DRS cluster

Host 1
Host 2
Host 3
Host 4
2H / 2L 2H / 2L 2H / 2L 2H / 2L


Key Findings

Figure. 1 compares the aggregate throughput of all
heavily loaded VMs in each of our test case (Table 1) with and without DRS. We
have not shown the performance of lightly loaded VMs as there was no change in their
performance.

Figure 1. Performance Gains with DRS

DRS
With DRS we observed:

  • 15 ~
    47% gains in aggregate performance for the cases tested.
  • That
    the higher the resource demand and imbalance in the cluster, the higher
    the performance gain
    .
  • No performance impact when the cluster was already balanced

This testing was the result of a joint effort
between VMware and EMC. We would like to thank the Midrange Partner Solutions
Engineering team at EMC,
Santa Clara for providing access to the hardware, for the use of their lab, and for their
joint collaboration throughout this project
.

Configuration Details

ESX Hosts (4)
HP DL380
4 Dual socket, Quad core Intel Xeon 5450 3.0GHz
32GB of Memory
Dual port QLogic QLE2462 HBA

VC Server (1)
HP DL380
4 Dual socket, Quad core Intel Xeon 5450 3.0GHz
8GB of Memory

Load Generators (4)
Dual socket, Dual core server
8GB of Memory

Storage (1)
CX 4-960 with 188 15K rpm FC disks

Virtual Platform:

VMware vSphere

Virtual Machines (16)
4 virtual CPUs
5GB memory
Windows Server 2003 x64 with SP2
SQL Server 2005 x64 with SP2
DVD Store version 2 (Large sized database)
http://www.delltechcenter.com/page/DVD+Store

Tuning
DRS aggressiveness threshold: 5 (most aggressive)

For more comments or questions, please join us in the VMware Performance Community website.

About the Authors:
Chethan Kumar is a member of Performance
Engineering team at VMware. Radhakrishnan Manga is a member of Midrange
Partner Solutions Engineering team at EMC.

350,000 I/O operations per Second, One vSphere Host

Summary

VMware vSphere includes a number of enhancements that enables
it to deliver very high I/O performance. In this study, we demonstrate that
vSphere can easily support even an extreme demand for I/O throughput made
possible by new products like Enterprise Flash Drives (EFD) offered by EMC. In the
experiments conducted at EMC labs, we were able to achieve just above 350,000
I/O operations per second with

  • Single vSphere host with just three virtual
    machines running on it
  • Latencies under 2ms
  • I/O block size of 8KB

What does such a high throughput mean to customers? Consider
this: the entire database of Wikipedia is supported by 20 MySQL servers each 200GB to
300GB in size. On an average Wikipedia receives 50,000 http requests or 80,000
SQL queries per second1, which
translates to 4.3 billion hits per day. With the storage infrastructure used in
our experiments we could easily accommodate the entire database of Wikipedia
and still be left with enough space. A single vSphere host driving more than
350,000 I/O requests per second could easily support the
throughput requirements of Wikipedia.

Background

In late May 2008, we published a blog article on
achieving 100K I/O operations per second with ESX 3.5. To achieve that, we had used 495 15K RPM Fibre
Channel disks spread across three CX3-80 arrays. If we were to push the
envelope further with vSphere, we needed more storage bandwidth. I
t would have taken approximately 1750 15K rpm Fibre Channel drives with 120
Disk Array Enclosures to provide the 350,000 I/O operations per second throughput
. If we
were to have some redundancy in the storage then the numbers would increase
further and go as high as 3500 drives for a RAID 1/0 configuration doubling the
entire SAN infrastructure.

Instead only 30 EFDs housed in three CX4-960 arrays provided enough
storage bandwidth for vSphere to drive just above 350,000 I/O requests per
second.


I/O workload

We could have achieved higher I/O operations per second with a smaller
block size, but we focused our studies on 8KB block because it is the
most  representative of real applications. We chose an I/O pattern that was 100% random in nature.

Key Findings

  • 3 VMs on
    one vSphere host supported 350,000 I/O operations per second with 8KB block
    size (Figure. 1)
  • A single VM with 2 vCPU and 4GB memory provided just under 120,000 I/O
    operations per second with 8KB block
    size
  • I/O
    latency as measured in ESX was just under 2 ms
  • VMware’s
    new paravirtualized SCSI adapter (pvSCSI) offered 12% improvement in throughput at 18% less CPU cost compared to LSI virtual adapter

350k
Figure.1 Scaling I/O
performance through vSphere


We are documenting all the experiments in detail in a white paper that will be posted on the VMware website. We encourage readers to refer to that white paper for more details.

This testing was the result of a joint effort between VMware and EMC. We would like to thank the Midrange Partner Solutions Engineering team at EMC,
Santa Clara for providing access to the hardware, for the use of their lab, and
for their joint collaboration throughout this project.

For more comments or questions, please join us in the VMware Performance Community website.

About the Authors:
Chethan Kumar is a member of Performance Engineering team at VMware. Radhakrishnan Manga is a member of Midrange Partner Solutions Engineering team at EMC.


Virtualized SQL Server Performance: Scalable and Reliable

Database workloads are very diverse. While most database servers are lightly loaded, larger database
workloads can be resource-intensive, exhibiting high I/O rates or consuming large amounts of memory. With improvements in virtualization technology and hardware, even servers running large database workloads run well in virtual machines. Servers running Microsoft’s SQL Server, among the top database server platforms in the industry today, are no exception.

An important consideration in SQL Server consolidation scenarios is application performance when packing multiple virtual machines on a single hardware platform. Application performance in virtual machines should continue to meet or exceed required service levels. That is to say, the virtual platform should:

  • Be scalable.
  • Ensure that all virtual machines get resources in proportion to their load levels up to specified resource limits.
  • Provide performance isolation for each virtual machine running on a host.
  • Ensure that the overall load of a host will have minimal impact on the performance of applications running in individual virtual machines on that host.

We recently published a white paper, "SQL Server Workload Consolidation," that demonstrates the ability of VMware® ESX 3.5 to scale while guaranteeing fairness and isolation under a demanding SQL Server load.