Poor storage performance is generally the result of high I/O latency, but what can cause high storage performance and how to address it? There are a lot of things that can cause poor storage performance…
– Under sized storage arrays/devices unable to provide the needed performance
– I/O Stack Queue congestion
– I/O Bandwidth saturation, Link/Pipe Saturation
– Host CPU Saturation
– Guest Level Driver and Queuing Interactions
– Incorrectly Tuned Applications
– Under sized storage arrays (Did I say that twice!)
As I mentioned in the previous post the key storage performance indicators to look out for are 1. High Device Latency (DAVG consistently greater than 20 to 30 ms) and 2. High Kernel Latency( KAVG greater than 2 ms). Once you have identified that you have High Latency you can now proceed to trying to understand why the latency is high and what is causing the poor storage performance. In this post, we will look at the top reason for high Device latency.
The Top reason for high device latency is simply not having enough storage hardware to meet your application’s needs (Yes, I have said it a third time now), that is a sure fire way to have storage performance issues. It may seem basic, but too often administrators only size their storage on the capacity size they need to support their environment but not on the Performance IOPS/Latency/Throughput that they need. When sizing your environment you really should consult your Application and Storage Vendor’s best practices and sizing guidelines to understand what storage performance your application will need any what your storage hardware can deliver.
How you configure your storage hardware, the type of drives you use, the raid configuration, the number of disk spindles in the array, etc… will all affect the maximum storage performance your hardware will be able to deliver. Your storage vendor will be able to provide you the most accurate model and advice for the particular storage product you own, but if you need some rough guidance you can use the guidance provided in the chart below.
The slide shows the general IOPs and Read & Write throughput you can expect per spindle depending on the RAID configuration and/or drive type you have in your array. Also frequently I’m asked what is the typical I/O profile for a VM, the guidance varies greatly depending on the applications running in your environment, but a “typical” I/O workload for a VM would roughly be 8KB I/O size, 80% Random, 80% Read. Storage intensive applications like Databases, Mail Servers, Media Streaming, … have their own I/O profiles that may differ greatly from this “typical” profile.
One good way to make sure your storage is able to handle the demands of your datacenter, is to benchmark your storage. There are several free and Open Source tools like IOmeter that can be used to stress test and benchmark your storage. If you haven’t already taken a look at the I/O Analyzer tool delivered as a VMware Fling, you might want to take a peek at it. I/O Analyzer is a virtual appliance tool that provides a simple and standardized approach to storage performance analysis in VMware vSphere virtualized environments ( http://labs.vmware.com/flings/io-analyzer ).
Also when sizing your storage make sure your storage workloads are balanced “appropriately” across the paths in the environment, across the controllers and storage processors in the array and balanced and spread across the appropriate number of spindles in the array. I’ll talk a bit more about “appropriately” balanced later on in this series as it varies depending on your storage array and your particular goals/needs.
Simply sizing your storage correctly for the expected workload, in terms of size and performance capabilities, will go very far to making sure you don’t run into storage performance problems and making sure your Device Latency (DAVG) is less than that 20-30ms guidance. There are other things to consider which we will see in future post, but sizing your storage is key.
PS. This week I’m at the HP Discover Event presenting with Emulex, a VMware Partner, Session TB#3258: The benefits and right practices of 10GbE networking with VMware vSphere 5 @ 2:45 June 6th. If you are at the show come by and say Hi. I’ll also be at the VMware Booth in the main expo hall most days.
Continue to Troubleshooting Storage Performance – Part 3:
/vsphere/2012/06/troubleshooting-storage-performance-in-vsphere-part-3-ssd-performance.html
Previous Troubleshooting Storage Performance post:
http://blogs.vmware.com/vsphere/2012/05/troubleshooting-storage-performance-in-vsphere-part-1-the-basics-.html