Uncategorized

A VMware Perspective on IT as a Service, Part 2: An In-house Example of IT Transformation

By: Paul Chapman, VMware Vice President Global Infrastructure and Cloud Operations

In this series of posts, I’m offering a VMware Corporate IT perspective on the journey to IT as a Service (ITaaS), looking at how we adopted the movement ourselves, sharing some of the many benefits that ITaaS is bringing us, and offering some insights on how – if you’re considering taking the plunge – you might successfully make the transition yourself.

Last time, I outlined the context for the movement to IT as a Service – one that suggests we’re now at a point where IT can no longer hide behind the complexity of IT environments, and where IT organizations need to deliver on new consumer expectations of service delivery if they are to have the agility and efficiency to deliver at the speed of business.

Today, I’m going to share the story of one of the functional IT groups at VMware – our Applications Operations group – that has transformed by shifting to a focus on agility and automation, with game changing results. If you’re curious to learn more, check out the full case study, or a short summary video here.

A Problem with Process

Here’s what happened: The Cloud Operations group within VMware corporate IT oversees the support of a portfolio of ~200+ business applications. The application operations team (AppOps) provisions and manages very complex SDLC development and test environments for a team of ~600+ global developers and quality assurance engineers who work on the VMware program portfolio.

By the middle of 2012, the AppOps team realized that it faced a serious issue with provisioning these environments.

As things stood, their processes were:

  • Slow – Manually provisioning a dev/test SDLC instance for a full enterprise applications ecosystem was taking in the range of 4-6 weeks per instance,
  • Disruptive – Hundreds of developers had to wait for a reliable new instance for extended periods of time, multiple times during the lifecycle,
  • Risk – Cascading delays created risk, keeping other portfolio projects from being able to start and/or complete on time, potentially costing millions of dollars in delays,
  • Inconsistent – Quality and lead times were unpredictable, varying with schedule complexity, different outcomes from manually repeated processes, and the capacity and availability of team members distributed around the globe.

The knock-on impact of a delay was very costly. Every time a new environment experienced delays, developers were idle and millions of dollars were at stake. This made portfolio planning inordinately difficult. We could have shrunk the portfolio and slowed the delivery of business critical programs in response, but that was unacceptable given our overall corporate growth objectives.

Then, not surprisingly, IT was therefore under considerable pressure to increase its agility, speed, and throughput.

Not the Easy Fix

Clearly, AppOps needed to reduce provisioning times and increase schedule predictability and service quality.

One way to do that would have been to try and improve the efficiency of the large “human middleware” they already had in place, applying lean methodologies and trying to be as “efficient” as possible when executing standard repeatable tasks.

However, a thorough process review made it clear that more than a continuous efficiency program was required. The primary issue was that they were scheduling and managing a large number of people who were performing, in the most part, skilled but repeatable tasks. Even with an improved provisioning process, the human-middleware problem would never fully go away, as speed and predictably could never reach the desired goals.

Instead, the AppOps group chose to completely replace and automate its provisioning process using a VMware on-premises private cloud, based on the software-defined data center. This would completely automate SDLC instance provisioning, using blueprints, policies, and automation and management capabilities using the VMware vCloud® Suite and other adjacent tools.

If they were to succeed, two factors would be critical:

  • Ambitious, long term objectives. To be successful, any solution needed to be game changing – instead of making incremental improvements to the existing process, AppOps was looking to turn a process that traditionally took 4-6 weeks to into one taking just a matter of hours. Solving this problem required a radically different approach that was built from the ground up.
  • An available private cloud. VMware had already deployed, at scale, its private cloud (called ‘Project OneCloud’), delivering infrastructure-as-a-service (IaaS) capabilities for internal use. With vCloud Suite’s automation and management capabilities, the private cloud could host all non-production SDLC instances – eliminating the need for lengthy hardware provisioning cycles.

By late 2012, the AppOps team was ready to start building the new, automated and streamlined provisioning platform, setting itself the goal of deploying all Dev/Test SDLC instances within 24 hours of a request.

Doing this meant driving transformation in three areas:

  • Architecture – Shifting from a traditional virtualized data center environment to a SDDC private cloud and deploying cloud management with automation capabilities to provision complex SDLC environments. Each instance contains over 30 applications, including the company’s full ERP, custom applications, portals, middleware, IDM, BI, webservers, app servers, integrations, databases, and more.
  • Operations – Converting manual, time consuming processes to an end-to-end, automated scripted process with blueprint-based provisioning. Key employee transitions would include investments in change-management and supporting employees through training and education, moving them to more value-added and meaningful roles in the new cloud operating model.
  • Financial – Moving from a project-capex based infrastructure funding model to a service-opex consumption and chargeback model. Instead of incurring costs for building and maintaining infrastructure to support the virtual machines, IT could pass the cost of workloads to individual project requestors. In turn, because of the ability to provision quickly and provide transparent opex service costs, there has been a higher increase in de-provisioning instances which has in turn increased infrastructure utilization and reduced spend on net-new infrastructure.

The Payoff and Business Benefit

Phase one of the project – deploying basic automated provisioning and management capabilities – has now been completed. 2,800 virtual machines that support dev/test instances have been transitioned to the new OneCloud environment, resulting in game-changing benefits:

  • Reduced provisioning time from 4-6 weeks to 36 hours: on track to achieve goal of <24 hours,
  • Increased productivity of 600 developers by as much as 20 percent,
  • Improved service quality so that AppOpps can now consistently say “Yes” to all project requests in the time required,
  • Saved the business $6M per year in infrastructure and operating costs,
  • Moved people to higher-order, more meaningful IT roles, e.g. blueprinting and automation design.

Phase two will focus on further enhancing automation and management capabilities and transitioning more pre-production environments to the private cloud.

Lessons Learned

  • Agility investments are self-sustaining. Investing in increased agility yields significant additional benefits, such as substantially reduced operating and infrastructure costs, and increasing service quality.
  • vCloud Suite is a full solution. The AppOps team implemented vCloud Suite to automate provisioning and management of SDLC instances. Out-of-box functionality let them automate and manage a wide range of core tasks. The availability of SDKs and APIs let them deliver additional automation and management functionality through adjacent tools.
  • On-demand capabilities change IT service consumption. SDLC instances are no longer viewed with the same risk outlook as before. Where developers and applications owners formerly felt the need to keep an instance open for multiple and/or on-going projects, AppOps can now release those instances back into the provisioning pool in a “disposable infrastructure” service consumption model.
  • APIs replace ticketing and late-night meetings. A service catalog and API calls help IT clarify and simplify communication about the services AppOps delivers and what its customers can expect in return. Efficiency has replaced the time-consuming, difficult, and highly-variable task of scheduling and coordinating work between multiple, globally distributed teams.

Key Takeaway:

The VMware corporate IT organization decided to invest in improving agility, and, as a byproduct, not only increased service speed and quality, but also dramatically lowered IT infrastructure and operating costs.

Next time, I’ll look at agility: how we measure it and how we keep continuously improving. In Part 4, I’ll explain what it took to stand up and run our own internal private cloud that so far include  ~50k VMs.

For more information in the meantime, please see:

Follow @VMwareCloudOps & @PaulChapmanVM on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.