Increasing Efficiency with vRealize Operations & Proactive Capacity Management
By Alberto Martinez
“Become more efficient”, “reduce costs” and “sustainable growth” are common drivers in our customers’ strategies, both at a business and IT level. Today, these drivers are more pressing than ever as market competition increases and investment decreases. What are you doing at a virtual infrastructure level within your organization to support them?
This post will focus on a key capability that addresses all three drivers: proactive capacity management.
Proactive capacity management enables IT to reclaim resources that aren’t being used and balance utilization, effectively right-sizing the virtual infrastructure. This increase in efficiency will free up VMs that you can allocate to other projects, reducing costs and supporting growth.
Our experience in the field has proven that to successfully perform this exercise, you must first understand the following 3 key questions: the WHAT, the WHEN and the HOW.
What: Defining your Proactive Capacity Management
To proactively manage the capacity of your virtual infrastructure, you need to look at three areas:
- Resource Reclamation (oversized VMs)
Reclaim CPU & memory that are not being used by VMs
- Virtual Machine Recertification (idle & powered off VMs)
Rectify that the VMs are supporting a service (being used)
- Hot Spot Identification (stressed VMs)
Identify VMs that are undersized and require more CPU or memory
Establishing a proactive capacity management process will efficiently right size your provisioned virtual infrastructure. I explicitly refer to this as a “process” because right sizing is more about the correct engagement with your business than the technical activities.
When: Implementing proactive capacity at the correct moment
Proactive capacity management is not an activity to be performed during project lifecycle because at that point IT architects and / or Project Managers will provide (and pay for) VM sizing based on vendor recommendations, even though these may seem oversized at the time and will probably include conservative margins. Proactive capacity management should be performed once the VM has been provisioned, the project has been completed and a reasonable amount of time has passed. At that point, the Virtual team will be better placed to analyze the performance and behavior of that VM in your infrastructure.
For example, one of my customers determined that VM cost was considered depreciated after 4 years, moving the cost from project CAPEX to the Virtual team OPEX budget. In that case, VMs whose period of life was greater than 4 years would be the best candidates to focus on when implementing proactive capacity.
How: A 7 Step Process for Proactive Capacity Management
There is no “one size fits all” process for proactive capacity management, as each organization will have its own particularities, so customizing that process is key to success. The process I have laid out below highlights the key steps in proactive capacity process which you shouldn’t miss, and what “configurations” should be applied based on your specific environment.
- Extract Reports from the vRealize Operations Tool (vROPs)
Use the information available in vROPs reports about the vSphere environment as an input to the proactive capacity management process and make sure that you ignore those VMs that has gone through already the process (use vSphere tagging to identify them!). Agree on the scope of the analysis (environment, virtual platforms, dedicated zones such as DMZs or services such as databases) and identify key experienced individuals to run the process with deep knowledge on your vSphere environment & understanding on the lines of business.
- Analyze the Information Extracted
Create a detailed list of candidates with the information extracted from vROPs including key information such as VM name, environment, Line of Business and action to be performed (oversized CPU / Mem, powered off, idle, stressed CPU /Mem). Include cost savings opportunities to each candidate (oversized, powered off, idle) or additional costs for stressed VMs. I´ve seen this process failing many times because communications were too technical!
- Engage with the VM Owner
A big part of the process is interacting with those who are responsible for the VM. First identify who to engage with (typically someone from a line of business or within the IT or Application Development organizations). Define what the engagement process will be (ask for approval, just inform them, directly do not engage with them). Finally, standardize the communications by using templates with cost savings information and detailed technical analysis so the person responsible for the VM is confident in your recommendation.
Once the approved list of candidates is finalized and you have engaged with the VM owners about when to implement the changes, consolidate those dates into a calendar that considers change windows and freezing periods. Create change requests to track those implementations and include roll-back plans accordingly (snapshots, revert resources reclaimed, etc.).
Perform the technical implementation of the proactive capacity management. Roll-back if there is any problem during the implementation, inform the VM owner about the status completion of the implementation (successful, failed, etc.) and set the appropriate VM tag (reclaimed OK, reclaimed FAILED).
Define a reasonable period of time to monitor the performance of the updated VM and app / service. This could be anywhere from a week, 2 weeks, a month, and so on. Use a vROPs custom dashboard to monitor the Health of those updated VMs.
- Report & Close
Consolidate the information captured during the monitoring phase, create a detailed “Proactive Capacity Analysis” report and distribute it to the appropriate list of stakeholders. This list could include the application or service owner, IT management, etc. The report should include achieved cost savings (approved and reclaimed candidates) and potential cost opportunities (rejected candidates).
Key Considerations for Implementing Proactive Capacity Management
As we’ve worked with customers to apply this methodology, our Operations Transformation Services team has identified some key common considerations that will drive the success of this initiative:
- Start simple, test the process and then expand it to more complex environments. At the beginning you will have a large list of potential candidates, so start by executing the process fairly regularly, such as every week, with a small number of target VMs selected from a less risky environment, such as application development. It’s much easier to power off a VM in development than in production, don´t you think? This way you will have more control over those workloads while you continue to refine the process.
- Sponsorship is crucial to provide the correct level of empowerment to the Virtual They need this support in order to lead the process effectively and make the appropriate decisions.
- Build a trustful relationship with the business by presenting consistent information and establishing confidence in the IT organization around the proactive capacity management process.
- It’s not about cost savings today, it’s about cost savings tomorrow. Resource reclamation improves long term efficiency by increasing the pool of pre-provisioned resources with the reclaimed resources (freeing up physical hosts) and continuously executing the proactive capacity process to ensure rightsizing of the infrastructure.
If you are ready to optimize your virtual infrastructure through proactive capacity management or want to know more about this key IT infrastructure capability, the VMware Operations Transformation for Performance & Capacity Management service is a great place to start. Reach out to your VMware representative and engage with the team to get started.
Alberto Martinez is an operations architect with the VMware Operations Transformation EMEA practice and is based in Spain.