By Cathal Cleary, Director, CPE Services management, VMware
While cloud technology has revolutionized the way enterprises consume IT, there are significant challenges surrounding control and optimization of spend. The public cloud makes it very easy to sign-up and consume public cloud resources—so easy, in fact, that people tend not to think about the potential costs. This was definitely true at VMware. When we first started our cloud journey, we had a variety of public cloud accounts created within the business and funded by credit cards. Yet few, if any, knew the actual ongoing costs.
Our initial attempt was to consolidate all these accounts under an umbrella account, and provide decision makers with complete visibility. While native tools, such as AWS Cost Explorer, could provide some visibility within their realm of knowledge (i.e., AWS), it was difficult to get a consolidated spend data across all clouds. Similarly, it was difficult to associate that spend with key business dimensions like business unit, cost center, etc.
What was needed was best-in-class, dedicated tooling. After extensive research, we discovered CloudHealth could resolve these challenges.
CloudHealth allows IT to present data to each business unit in terms of what they are actually spending and on what type of service. In many cases, this was an eye-opener as teams simply did not know. The software proved so effective that, today, every group from finance to engineering must employ this new information to clean up their cloud usage and spend.
With a consolidated view on spend, we could now approach the vendors for better pricing based on volume. Public cloud providers offer bulk buying programs, known as ‘reserved instances’ (RIs), that allow us to trade predictability for better pricing. Under a typical RI program, you agree with the cloud provider to use a certain amount of capacity during a set period—typically one or three years—in exchange for a predictable, discounted price.
Unfortunately, these types of programs also have many nuances and options that make them difficult to manage. CloudHealth adds a tremendous amount of value here, and our team was able to pre-purchase 60% of our AWS instance capacity through their RI program. However, we must still continue to measure, report and optimize the RI fleet during the contract term to ensure compliance. This is accomplished by regularly buying new RIs as older contracts end, and by converting RIs from one instance to another using convertible RIs.Through regular RI purchase and management, we were able to save 35% across our entire EC2 (Amazon’s Compute Service) fleet over the cost of buying all on-demand. We plan to improve this figure through:
- Increasing our investment in EC2 RI fleet through regular assessments and purchases.
- Decreasing our reliance on EC2 through encouraging our developer community to use more cloud-native techniques such as containers and serverless computing.
Phase 3—Proactive Management
CloudHealth has a number of policy options to allow us to be proactive with cloud optimization. We implemented a budget policy that would alert engineers and finance controllers when an account was nearing 80% of their monthly budget spend. We also created policy to expose ‘waste’ in the form of disconnected Amazon Elastic Block Store (EBS) volumes, elastic IPs and aging snapshots. Once we get stakeholder buy-in, we intend on enabling an automatic cleanup option for such waste.
With CloudHealth, our team significantly eliminated VMware’s overall cloud resource waste. This effort alone reduced our public cloud compute cost by more than 30%. By exposing data, we stabilized our public cloud spend growth and brought all-new levels of predictability to our budgeting efforts.