Happy workloads support great service to the line of business and efficient resource management—in the best scenario, a “win-win” for IT and the business stakeholder. But the pursuit of perfection comes at a cost; hence “perfect” and “optimal” workload placement may look different. In this post we’ll take a look at the hidden costs of perfection and what you can do about it using three tools for easy, optimal workload placement. The optimal workload placement strategy supports great application performance for the workload in question without negatively impacting any other workload; considers cost-benefit tradeoffs in moving at the first sign of degradation; and ultimately, drives gains in resource utilization.
The trouble with perfection arises when workload balance is weighted too heavily as the single source of goodness. Too many approaches focus on the movement of a workload when performance bogs down without giving enough consideration to the full impact on the new VM and all its other workloads. And, few tools accurately consider the tradeoffs of moving to the workload itself, though arguably it may be the best choice in many cases.
Workload management starts with two key questions: where do I best locate new workloads; and how should workloads be arranged to have the best performance and not be negatively impacted?
VMware’s Distributed Resource Scheduler (DRS) answers the question of where to put a new workload by placing it on the host that has the best access at the time of deployment. When unplanned events degrade performance, it can reassign the workload or notify IT and make a recommendation, depending on the workflow you’ve chosen. This allows you to consider factors external to the workload itself, such as changes to the business.
At the heart of the cost-benefit analysis is the question: how aggressively do I want to balance? The best approach uses cost-benefit analysis to support decision-making and policy setting based on deep understanding of each VM and VM cluster, including history and patterns over time. This is a key difference between VMware and its competitors – we don’t necessarily move loads just because of one host. That one workload is not perfectly balanced doesn’t mean there’s a performance problem. And even if there is, if it’s minor, the cost of moving the workload for the sake of perfect balance may not offer enough benefit to be worthwhile. Is incremental benefit at the margins worth it? Depending on your priorities, it may be, but across the board? Likely not.
Cost-benefit analyses need to consider VM “happiness”—which we’ll loosely define for our purposes here as the absence of contention. Contention arises when multiple VMs want access to resources under the physical infrastructure and are made to compete. Policies that prioritize one over the other in that event can minimize this issue, but unplanned issues still arise. When there’s contention, app performance suffers.
Understand that moving workloads has a performance tradeoff all its own. Some workloads actually don’t like each other. And some do; we call that affinity. Not unlike a team or group of humans, each individual performs best when the team is working well together. The time-honored norm of using around 70 percent CPU utilization as a trigger event to move a workload is not necessarily flawed; it’s just not enough. The best approach will take a look at resources in the aggregate and consider historical data and patterns in the application’s performance across time. Simply put, maybe that workload isn’t perfectly balanced since you returned from lunch, but does history show it’s likely to be optimal again around end of day?
vRealize Operations looks across the whole management plane; at multiple clusters, across locations and workloads. It elevates the DRS concept from a cluster to the big picture across clusters so you get accurate capacity planning. The concept of the “custom data center” is an abstraction that aggregates all CPUs, memory, and storage into buckets and then doles out resources from the pool. This approach optimizes workload placement for the good, rather than the perfect, of each workload, ensuring no one “starves” but taking as a higher good that if “perfect” balance falls out for periods here and there, it’s actually best for the community as a whole. By taking this approach, vRealize Operations drives greater VM density, which means improved ROI and lower TCO. This kind of bigger picture, dynamic load balancing boosts CPU utilization, and even if it bumps it from 70 to just 75 percent, that five percent across all your VMs adds up—and becomes critical as you scale-out (in other words, meet the demands that have become business as usual).
Finally, Predictive Distributed Resource Scheduler (PDRS), a feature for enterprise-grade installations, is a more sophisticated analysis of performance over time. Able to discern patterns, not just spikes, PDRS goes well beyond the typical “snapshot in time” approach and drives highly accurate capacity planning. One way PDRS is unique in the way it does this is by looking at not just spikes but the curves around the spikes—was it a hockey stick? A step-function build-up? A gradual curve? Or has the behavior preceding the spike differed each time? This kind of understanding delivers a cost-benefit analysis that shifts the pursuit of perfect balance to a fluid plan of action based not on what’s best for the workload at this moment, but across the span of time.
Thomas Bryant is a Product Line Marketing Manager in the Cloud Management Business unit at VMware. His primary focus is on analysis of vendors in Cloud Management and Performance Monitoring spaces. With over 15 years experience in technology, Thomas has a broad-based background including holding key industry certifications & recognitions from VMware, Microsoft and Citrix. He has been recognized multiple times by VMware as a vExpert. He also enjoys reading, writing and arithmetic.