Virtualization adds new software layers and new types of interactions between SQL Server and the hardware components. While the general methodology for monitoring and troubleshooting SQL Server performance does not change, VMware provides additional tools for monitoring and troubleshooting at the physical host level. This is the first of a series of Monitoring and Troubleshooting SQL Server performance on VMware blogs. In this blog, I'd like to discuss how to effectively monitor SQL Server performance on VMware.
SQL Server and Windows provide a number of tools for monitoring performance. Those are still the primary tools to use for identifying problem areas, and resource bottlenecks with SQL Server running on VMware. The methodologies stay the same as performance monitoring SQL Server in a physical environment.
- Performance Monitor: monitors overall resource usages. SQL Server specific counters: \SQLServer:*\*
- SQL Server Profiler: monitors performance at T-SQL statement level. Common events of interests are: SQL:BatchStarting/SQL:BatchCompleted, RPC:Starting/RPC:Completed, Errors and Warnings (All), P:StmtStarting/Completed, SQL:StmtStarting/Completed
- DMVs: monitors the internal health of SQL Server, sys.dm_*
For more information on using these tools for monitoring and troubleshooting SQL Server, see Troubleshooting Performance Problem in SQL Server 2008.
VMware provides two main tools for observing and collecting performance data at the host-level: the vSphere Client and the esxtop/resxtop utilities.
vSphere Client:
- GUI interface, primary tool for observing performance and configuration data for one or more ESX/ESXi hosts
- Does not require high levels of privilege to access the data
esxtop/resxtop
- Command line utility that runs in interactive, batch, or replay mode
- Gives access to detailed performance data of a single ESX/ESXi host
- Provides fast access to a large number of performance metrics
-
Requires root-level access
Figure 1. Performance Chart Views with vSphere Client |
Figure 2. resxtop CPU Metrics |
Among the performance data exposed by host-level monitoring tools, the following is a list of key metrics that can help you quickly isolate issues to a specific resource area such as CPU, memory, storage, or network. See VMware Communities: Interpreting esxtop Statistics and vCenter Performance Counters for a full list of counters.
Note** The measurement units reported in esxtop/resxtop and vSphere Client may be different. See the above VMware community links for details.
Resource |
Metric (esxtop/resxtop) |
Metric (vSphere Client) |
Host/Virtual Machine |
Description |
CPU |
%USED |
Used |
Both |
CPU used over the collection interval (%) |
%RDY |
Ready |
Virtual Machine |
CPU time spent in ready state |
|
%SYS |
System |
Both |
Percentage of time spent in the ESX Server VMKernel |
|
Memory |
Swapin, Swapout |
Swapinrate, Swapoutrate |
Both |
Memory ESX host swaps in/out from/to disk (per virtual machine, or cumulative over host) |
MCTLSZ (MB) |
vmmemctl |
Both |
Amount of memory reclaimed from resource pool by way of ballooning |
|
Disk |
READs/s, WRITEs/s |
NumberRead, NumberWrite |
Both |
Reads and Writes issued in the collection interval |
DAVG/cmd |
deviceLatency |
Both |
Average latency (ms) of the device (LUN) |
|
KAVG/cmd |
KernelLatency |
Both |
Average latency (ms) in the VMkernel, also known as "queuing time" |
|
GAVG/cmd |
TotalLatency |
Both |
Average latency (ms) in the guest. GAVG = DAVG + KAVG |
|
Network |
MbRX/s, MbTX/s |
Received, Transimitted |
Both |
Amount of data transmitted per second |
PKTRX/s, PKTTX/s |
PacketsRx, PacketsTx |
Both |
Packets transmitted per second |
|
%DRPRX, %DRPTX |
DroppedRx, DroppedTx |
Both |
Dropped packets per second |
Of the CPU counters, the total used time indicates system load. Ready time indicates overloaded CPU resources. A significant swap rate in the memory counters is a clear indication of a shortage of memory, and high device latencies in the storage section point to an overloaded or misconfigured array. Network traffic is not frequently the cause of most database performance problems, but it is an area to monitor to make sure the network bandwidth is there.
When SQL Server is running in a virtual environment, any time-based measurements reported with SQL Server monitoring tools or Windows Perfmon may be inaccurate if the host machine resources are over-committed. The accuracy of the in-guest tools would depend on the guest OS and kernel version being used, and the total load of the VMware ESX host. The vSphere tools provide more accurate reporting on those measurements. See vSphere Performance Troubleshooting for more information. For effective monitoring and troubleshooting of SQL Server performance problem on VMware use the SQL Server tools and Perfmon for monitoring SQL Server resource usage, server state, and internal health. Focus on identifying performance bottlenecks instead of time-base measurements and then further correlate performance data collected from the host using vSphere tools. By focusing on key performance metrics you can quickly isolate issues to a particular resource area.
For example, for identifying a CPU bottleneck, instead of looking at the % Processor Time Perfmon counter, you can focus on the Process Queue Length counter. If there are a high number of processor queues and SQL Server Wait Statistics is showing a high number of waiters for worker, it's obvious a processor bottleneck exists. You can then use vSphere Client to verify:
- If the number of virtual processors allocated to the SQL Server virtual machine is sufficient.
- If the Shares, reservation, and limits for the virtual machine CPU Resource Allocation are configure properly.
- If there is a host overload with sustained high %used, and high %rdy.
Similarly, you can effectively identify a disk bottleneck if the Disk Queue Length is high and a high number of SQL Server users are waiting on PAGEIOLATCH_EX, PAGEIOLATCH_SH. After a resource bottleneck is identified on the SQL Server virtual machine you can correlate statistics collected from the vSphere Client and esxtop/resxtop to identify any host level configuration issue that's affecting the SQL Server virtual machine performance.
Thanks for reading. On my next blog, I will discuss identifying and solving some common SQL Server on VMware performance problems. Stay tuned…
-Wanda
Wanda He, Technical Solutions Architect