Uncategorized

Automated Deployment and Testing Big ‘Hairball’ Application Stacks

By: Venkat Gopalakrishnan

What changes when you automate provisioning of complex, business-critical application stacks? Many good things, as it turns out.

As the director of IT Operations in VMware’s corporate IT team, I’m responsible for our portfolio of 140 enterprise applications. Some of these applications are mission critical and power the most important value chain activities we use to deliver value to our customers. My team’s customers are the 600 developers who update and modify our critical business applications on an ongoing basis. We support about 40 major development initiatives each year. For each initiative, we stand up a fully functional dev/test environment at multiple stages of the development lifecycle, before each project goes into production. Each time, it takes two to five weeks of calendar time for my team to stand up, functionally test and environmentally test each instance. During that time, developers (who are my customers) are forced to wait and become idle, or start on other projects that end up disrupting the flow of their efforts.

These are not simple deployments of a web server or Oracle database. Each instance contains full working copies of major applications like Oracle or Workday, as well as the supporting applications, middleware, databases, load balancers and integration points that are required to hand working copies of production systems to developers in order for them to do their work. See the below diagram of a typical dev/test instance we provision multiple times during 40 major development projects each year:

These complex instances can include many VMs. As a result, while we have roughly 500 production VMs, we have more than 4,000 non-production VMs related to these development projects.

Last year, using our vCloud Suite, VMware IT deployed a private cloud in a basic IaaS configuration and called it “OneCloud.” Just one year after its launch, there are now nine different internal groups (tenants) with over 35,000 VMs in OneCloud. By year’s end, we’ll have 12 tenants and 50,000 VMs! In fact, VMworld runs on OneCloud, as does our online training, global services and support. If a sales engineer gives you a VMware product demo, that demo instance is provisioned in OneCloud. The more tenants and workloads you add in OneCloud, the lower the cost per VM. As a result, the monthly cost per VM in OneCloud is much lower than the cost in the traditional datacenter environment we have been using for our 4,000 dev/test VMs.

This year, we started moving all 4,000 of our non-production enterprise application dev/test VMs to our private cloud. It offers basic Infrastructure as a Service (IaaS). However, my team is still responsible for Operations related to these workloads: We monitor performance. We maintain instances during their lifecycle. We upgrade and patch components as needed, and we retire instances when development is done. The IaaS team provides resources on-demand and ensures that those resources are available and perform to SLA, but my team is responsible for everything else, including provisioning these complex instances.

We developed advanced capabilities using a range of products in the vCloud suite to automate provisioning and management of our non-production workloads on OneCloud, but not every tenant in OneCloud needs these advanced automation and management capabilities. If you deploy a working sales demo, you get one-click deployment from a stored VM image. But, in order to get the one-click deployment of big “hairball” enterprise application dev/test instances that we built for developers requires a more sophisticated solution. Our automation and management capabilities include multiple tools and manage work across dozens of steps. Provisioning a VM is one small step in a much bigger process.

So, what happened when we automated provisioning of these complex instances and moved to the private cloud?

  1. We will save $4.5M in annual infrastructure costs by moving 4,000 VMs to a low-cost private cloud;
  2. We will reduce our provisioning time of two to six weeks to only 24 hours, including functional and environmental testing – which will save us an additional $1.5M in operational costs;
  3. And (this is a big AND), we give back eight weeks of productivity improvement to 600 developers for every project. Now, when developers call and need a project started, we can finally say “yes” every time, regardless of how many other projects we are working on.

Developers are my customers – allowing my customers to shave two months off of a nine-month project and not saying “No, we are too busy?” That, as they say, is priceless!

So, how exactly did we do this?  What lessons did we learn using our own products?

To find out, join me during our VMUG CloudOps SIG webcast this Thursday, July 25th. I’ll share all the details about the operational changes we made to successfully deliver these benefits.

Also, if you’re attending VMworld, we’ll also go deep on the technical aspects of the automation, the management capabilities we developed, and the operational transformation changes that made it all work in the following sessions:

OPT5194 – Moving Enterprise Application Dev/Test to VMware’s internal Private Cloud- Operations Transformation

VSVC4948 – Moving Enterprise Application Dev/Test to VMware’s internal Private Cloud – Architecture, Implementation and Integration

Interested in more CloudOps sessions at VMworld? See our highlights of the Operations Transformation track. For more on automation, check out some of our previous blog posts:

For future updates, follow @VMwareCloudOps on Twitter, and join the conversation by using the #CloudOps and #SDDC hashtags.