AUTHOR: Craig Stanley (@benchmarkguru)
The commonly stated mantra of benchmarking is that “you can’t manage what you can’t measure”. While that is certainly a truism, a corollary I’d like add is that “you can’t measure what you can’t quantify”. Figuring out what KPIs you need to track is usually not too difficult, but determining how to quantify some KPIs can seemingly involve black magic, smoke and mirrors. It can be mystifying to say the least. This article discusses the benchmarking measures and quantification behind VMware’s Infrastructure Move-The-Needle Vision Benchmark and will hopefully; shed a little light on the process.
The VMware Vision Benchmarking Team provides a customer service benchmark analysis called a Move-The-Needle. We determine a few basic KPIs, quantify where a customer’s performance is on these KPIs, measure the gap between where the customer is and where the best performing peers are and calculate the potential benefit of closing that gap, or moving-the-needle. It is important to note that some industries are more heavily invested in virtualization than others. In some cases, even the best performers may be woefully behind the technology curve in terms of leveraging virtualization into significant operational savings. Analytical insight into the technological maturity of the industry in addition to the best performing peers is invaluable in putting the information into an actionable perspective.
In the Infrastructure Move-The-Needle Benchmark, we look at five Key Performance Indicators:
- Percentage of x86 Physical Servers Virtualized
- Percentage of Desktops Virtualized
- Percentage of Virtual Machines that are DR Protected
- Virtual Machine to VM Administrator Ratio
- Number of Days Needed to Provision a Virtual Machine
These KPIs were initially selected based on feedback from large IT organizations as the basic metrics they needed to track and compare in order to measure virtualization maturity. These have the added benefit of being easy to collect and track consistently over time and between competing organizations. In the chart below, we illustrate the needle position and the impact of moving it to the best performing peer levels.
- x86 Percentage Virtualized – What is the percentage of the total number of x86 servers that are virtualized? This metric is used to gauge the overall integration of server virtualization in the organization and is important, as the road to a private cloud requires a high level of server virtualization. Additionally, significant CapEx and OpEx savings can be realized as virtualization drives up the utilization density while reducing the number of physically managed, energy consuming devices. Based on VMware ROI figures, we estimate that an organization can save approximately $3,400 for every physical server they can virtualize. About 84% of the potential savings is CapEx with 16% recurring annually. An added benefit is that physical servers may be able to be redeployed to offset growth or may be powered-off until needed to save on operating costs.
- Desktop Percent Virtualized – What is the percentage of the total number of desktops that are virtualized? This metric measures the depth of the implementation of VDI (Virtual Desktop Infrastructure) initiatives, which also play heavily in the future cloud ubiquitous computing model. The goal of desktop virtualization is threefold: 1.) Remove as much of the end user, physical support as possible, 2.)Consolidate into significantly fewer managed and maintained user images, 3.)Create an environment where the user experience is not tied to any one physical I/O device or location.
Initially, an investment in VDI is not likely to save or replace much hardware or software expense. End users still need an output device, be it a thin client on a PC or other device, and investment must be made in servers to support desktop virtualization. However, image and application support requirement such as patching, config management, etc., may be significantly reduced. We estimate that each desktop virtualized may carry a potential savings of about $500 per PC migrated.
There are hurdles to VDI as evident in the data we see suggesting a relatively small adoption rate, but VDI may prove to be one of those areas of potential strategic advantage for organizations seeking to deliver applications on demand and support multiple device form factors.
- Percentage of VMs under DR Protection – How many of your VMs are under a DR protection plan? This metric yields important information about the type of workloads supported on VMs. We expect a low percentage score may suggest that the VM workloads are not Tier 1 since they are not part of a DR implementation, while a higher percentage here suggests the VMs may be identified as Tier 1. This particular metric tends to provide better insight than just a metric of number of Tier 1 applications on VMs. The action to put them under DR protection is better proof that these applications are considered critical based on function not just name or business unit ownership.
Based on the VMware ROI data, we’ve estimated a potential savings of about $550 per hour of avoided downtime. Obviously, your results may vary depending on how much impact a particular Tier 1 application outage would have, but this can give you some idea of the potential savings or conversely, the potential exposure.
- VM to Admin Support Ratio – How many VMs do you support per VM Administration FTE? This is a commonly used and asked-for metric as it illustrates a support-FTE utilization ratio. The basic tenant of virtualization is that you are able to support more VMs than physical machines with the same number of FTEs. We would expect to see this number in the hundreds of VMs per Admin FTE based on the virtualization maturity of the organization and the number of VMs. It is important to note that the denominator is an FTE (Full Time Equivalent), which could be made up of fractions of multiple admin’s time and does not necessarily map one-to-one to physical headcount.
Moving the needle on this metric is measured in fewer FTEs needed to deliver the same quantity of support. While reducing the number of FTEs is one interpretation of this result, most organizations seek to leverage the freed-up resources to handle other projects or growth.
Something else to keep in mind is that a high ratio here, even higher than your peer group, may not be a good thing. This is where the analysis of multiple metrics comes into play. In many cases, an analysis cannot be based on a single metric or piece of information, and this metric is a good example of that. Let’s say I support 1,000 VMs per Admin and my peers only support 500 VMs per Admin. Certainly, my VM:Admin ratio (productivity) is better, but since this metric does involve a limited resource, ie labor, what does my labor performance look like? Does it take me several days to provision a VM? If so, then maybe I’m stretched too thin at a 1,000 to 1 ratio.
Server utilization is another good example of these dependent KPIs. If all my servers are always running at 100%, is that necessarily the best way to be operating? If I can’t deliver the workloads to the required SLA, then no, it isn’t.
- Days to Provision a VM – How many days does it take to provision a VM including requests, approvals and implementation? The understanding of this metric covers a couple of performance issues. The first is the level of automation and repeatable processes in place to be able to quickly provision a VM and deploy it into production. The second factor is the degree of standardization, policies and governance that enables a VM request to be quickly approved, procured and provisioned. We find that organizations farther along in their cloud journey have the processes and governance in place to be able to procure and provision a VM into production in only a few days, much faster than organizations without them.
This metric is also useful as the counterpart to the VM:Admin ratio as noted earlier and it balances speed to deliver against the resource level needed to deliver that speed. For example, if you can procure and provision a VM in a day while still maintaining a higher than average VM:Admin ratio, then you are likely to be more effective and efficient than peers. But if your provisioning time is lengthy with a high VM:Admin ratio, it may be indicative of a staffing shortage or lack of automation.
The value to the organization by reducing this number is that VMs and consequently, additional applications, can be deployed faster helping IT support a more agile, robust enterprise. We arrive at a savings figure based on $1,000 per day in opportunity savings resulting from being able to provision your virtualizable physical servers faster. While this is a typical soft or indirect saving, and the actual saving per day might vary, it does give us a way to assign a dollar saving to this improvement.
These are but a few of the many metrics that can be used to measure performance, utilization, efficiency and effectiveness. In this Infrastructure Move-The-Needle, we identify and quantify 5 KPIs that can drive savings and value back to the IT organization and the enterprise. By knowing where we are, and knowing where the best performers are, we can close that gap and reap the savings. You can get there from here by moving the needle.
Craig Stanley is the Benchmarking Practice Lead for the VMware Vision Team. You can follow him on Twitter @benchmarkguru.