Home > Blogs > VMware Operations Transformation Services > Monthly Archives: July 2013

Monthly Archives: July 2013

Automated Development and Test in Private Cloud: Join VMUG CloudOps SIG Webinar 7/25

Are you an IT practitioner considering private cloud?

Hear from VMware IT leadership about what they learned using VMware products to build a private cloud on SDDC architecture. They will discuss how they transformed people, process, organization, structure, governance and financial model to make VMware’s private cloud IaaS successful. This Thursday, join Venkat Gopalakrishnan, Director of the Software Defined Data Center (SDDC) and IT Transformation Initiatives, and Kurt Milne, Director of the VMware CloudOps Program for an exclusive webinar.

The webinar will cover:

  • SDLC lifecycle – supporting dev/test for 600 developers.
  • Using vCloud suite to automate end-to-end dev/test instance provisioning for complex application stacks.
  • Moving 4000 non-production dev/test VMs from traditional virtual to private cloud.
  • Improving agility and service quality, while also saving $6M in annual infrastructure and operating costs.

BONUS: This is a sneak peek of OPT5194 – VMware Private Cloud – Operations Transformation – one of the biggest sessions at VMworld 2013.

Register for this VMUG CloudOps SIG webinar today to see how you can take the private cloud from operational to transformational and learn how the private cloud can fit into your work environment. For a head start, take a look at our recent post, “Automated Deployment and Testing Big ‘Hairball’ Application Stacks” to hear more about the deployment from Venkat, one of the webinar’s speakers.

We will be also be live tweeting during the event via @VMwareCloudOps for anyone who is unable to attend the webcast. Feel free to join the conversation using the #CloudOps and #SDDC hashtags.

Automated Deployment and Testing Big ‘Hairball’ Application Stacks

By: Venkat Gopalakrishnan

What changes when you automate provisioning of complex, business-critical application stacks? Many good things, as it turns out.

As the director of IT Operations in VMware’s corporate IT team, I’m responsible for our portfolio of 140 enterprise applications. Some of these applications are mission critical and power the most important value chain activities we use to deliver value to our customers. My team’s customers are the 600 developers who update and modify our critical business applications on an ongoing basis. We support about 40 major development initiatives each year. For each initiative, we stand up a fully functional dev/test environment at multiple stages of the development lifecycle, before each project goes into production. Each time, it takes two to five weeks of calendar time for my team to stand up, functionally test and environmentally test each instance. During that time, developers (who are my customers) are forced to wait and become idle, or start on other projects that end up disrupting the flow of their efforts.

These are not simple deployments of a web server or Oracle database. Each instance contains full working copies of major applications like Oracle or Workday, as well as the supporting applications, middleware, databases, load balancers and integration points that are required to hand working copies of production systems to developers in order for them to do their work. See the below diagram of a typical dev/test instance we provision multiple times during 40 major development projects each year:

These complex instances can include many VMs. As a result, while we have roughly 500 production VMs, we have more than 4,000 non-production VMs related to these development projects.

Last year, using our vCloud Suite, VMware IT deployed a private cloud in a basic IaaS configuration and called it “OneCloud.” Just one year after its launch, there are now nine different internal groups (tenants) with over 35,000 VMs in OneCloud. By year’s end, we’ll have 12 tenants and 50,000 VMs! In fact, VMworld runs on OneCloud, as does our online training, global services and support. If a sales engineer gives you a VMware product demo, that demo instance is provisioned in OneCloud. The more tenants and workloads you add in OneCloud, the lower the cost per VM. As a result, the monthly cost per VM in OneCloud is much lower than the cost in the traditional datacenter environment we have been using for our 4,000 dev/test VMs.

This year, we started moving all 4,000 of our non-production enterprise application dev/test VMs to our private cloud. It offers basic Infrastructure as a Service (IaaS). However, my team is still responsible for Operations related to these workloads: We monitor performance. We maintain instances during their lifecycle. We upgrade and patch components as needed, and we retire instances when development is done. The IaaS team provides resources on-demand and ensures that those resources are available and perform to SLA, but my team is responsible for everything else, including provisioning these complex instances.

We developed advanced capabilities using a range of products in the vCloud suite to automate provisioning and management of our non-production workloads on OneCloud, but not every tenant in OneCloud needs these advanced automation and management capabilities. If you deploy a working sales demo, you get one-click deployment from a stored VM image. But, in order to get the one-click deployment of big “hairball” enterprise application dev/test instances that we built for developers requires a more sophisticated solution. Our automation and management capabilities include multiple tools and manage work across dozens of steps. Provisioning a VM is one small step in a much bigger process.

So, what happened when we automated provisioning of these complex instances and moved to the private cloud?

  1. We will save $4.5M in annual infrastructure costs by moving 4,000 VMs to a low-cost private cloud;
  2. We will reduce our provisioning time of two to six weeks to only 24 hours, including functional and environmental testing – which will save us an additional $1.5M in operational costs;
  3. And (this is a big AND), we give back eight weeks of productivity improvement to 600 developers for every project. Now, when developers call and need a project started, we can finally say “yes” every time, regardless of how many other projects we are working on.

Developers are my customers – allowing my customers to shave two months off of a nine-month project and not saying “No, we are too busy?” That, as they say, is priceless!

So, how exactly did we do this?  What lessons did we learn using our own products?

To find out, join me during our VMUG CloudOps SIG webcast this Thursday, July 25th. I’ll share all the details about the operational changes we made to successfully deliver these benefits.

Also, if you’re attending VMworld, we’ll also go deep on the technical aspects of the automation, the management capabilities we developed, and the operational transformation changes that made it all work in the following sessions:

OPT5194 – Moving Enterprise Application Dev/Test to VMware’s internal Private Cloud- Operations Transformation

VSVC4948 – Moving Enterprise Application Dev/Test to VMware’s internal Private Cloud – Architecture, Implementation and Integration

Interested in more CloudOps sessions at VMworld? See our highlights of the Operations Transformation track. For more on automation, check out some of our previous blog posts:

For future updates, follow @VMwareCloudOps on Twitter, and join the conversation by using the #CloudOps and #SDDC hashtags.

Industry Veterans Share Key Lessons on Delivering IT as a Service

Ian Clayton, an ITSM industry veteran, and Paul Chapman, VMware’s Vice President of Global Infrastructure & Cloud Operations, know a lot about IT service delivery. Join Ian and Paul next Tuesday, July 23rd at 9am PT, as they share real lessons learned about delivering IT as a Service.

You will get more out of this brief webcast than most of the sessions presented at expensive conferences!

The webinar will cover:

  • VMware’s own IT service delivery transformation based on cloud
  • Business justification of an ITaaS delivery model
  • Key success factors for driving technology and operational transformation

Outside-in thinking is needed to give IT a winning strategy. But inside-out leadership is required to make the changes that enable a successful execution. Don’t miss this opportunity to hear from IT experts as they share real advice in successfully delivering IT as a Service in the cloud era – register now!

We’ll also be live-tweeting during the event via @VMwareCloudOps – follow us for updates! Also join the conversation by using the #CloudOps and #SDDC hashtags.

Tips for Using KPIs to Filter Noise with vCenter Operations Manager

By: Michael Steinberg and Pierre Moncassin

Deploying monitoring tools effectively is both a science and an art. Monitoring provides vast amounts of data, but we also want to filter the truly useful information out of these data streams – and that can be a challenge. We know how important it is to set trigger points to get the most out of metrics. But deciding where exactly to set those points is a balancing act.

We all know this from daily experience. Think car alarms: If limits are set too tight, you can trigger an alarm without a serious cause. People get used to them. They become noise. On the other hand, if limits are too loose, the important events (like an actual break in) are missed, which reduces the value of the service that the alarm’s supposed to deliver.

Based on my conversations with customers, vCOps’ out-of-the-box default settings tend to be on the tight side, sometimes resulting in more alerts than are useful.

So how do you make sure that you get the useful alerts but not the noise? I’ve found that assigning Key Performance Indicators (KPIs) to each VM is the best way to filter the noise out. So this post offers some tips on how to optimally use KPIs.

First, Though, a Quick Refresher on KPIs

By default, vCOps collects data for all metrics every five minutes. As part of its normal operations, vCOps applies statistical algorithms to that data to detect anomalies in performance – KPIs are outputs from those algorithmic measurements.

Within vCOps, a metric is identified as a KPI when its level has a clear impact on infrastructure or application health. When a KPI metric is breached, the object it is assigned to will see its health score impacted.

A KPI breach can be triggered in the following ways:

  • The underlying metric exceeds a given value (Classic Threshold).
  • The underlying metric is less than a given value (Classic Threshold).
  • The underlying metric goes anomalous. This is a unique capability of vCOPs where a ‘normal’ range is automatically calculated so that abnormal values can be detected.

Typically, you would use one of these three options when setting a threshold, but combinations are also allowed. For example, you may want to set a classic threshold for disk utilization that exceeds a certain percentage. This can be combined with a dynamic threshold – where an alert is triggered if CPU utilization goes above its monthly average by more than x%.

Tips for Optimizing KPIs

KPIs provide the granular information that make up the overall health score of a component in the infrastructure, such as an application. The overall health score is a combination of statistics for Workload, Anomalies, and Faults.

Overly-sensitive KPI metrics, however, can cause health scores to decrease when there isn’t an underlying issue. In such instances, we need to optimize the configuration of vCOps so that the impact of anomalous metrics on health scores is mitigated.

Here are some ideas for how to do that:

Tip 1 – Focus on Metrics that Truly Impact Infrastructure Health

First, it’s good to limit the number of metrics you put in place.

With too many metrics, you’re likely to have too many alerts – and then you’re still in a situation analogous to having car alarms going off too often to be noticed.

Remember, overall health scores are impacted by any metric that moves outside its ‘normal’ range. vCOps calculates the ‘normal’ range based on historical data and its own algorithms.

Tip 2 – Define KPI Metrics that will Trigger Important Alerts

Next, you want the alerts that you do define to be significant. These are the alerts that impact objects important to business users.

For example, you could have a business application with a key dependency on a database tiers. An issue with a database or its performance would thus impact the user community immediately. To highlight these metrics, then, you’d want to focus on the set of metrics that can most closely monitor that database’s infrastructure setup KPIs.

Tip 3 – Use KPIs Across All Infrastructure Levels

In order to see the maximum benefit of KPI metrics, each metric should be assigned to the individual virtual infrastructure object (i.e. Virtual Machine), as well as any Tiers or Applications that the Virtual Machine relates to.

This is an important step as it makes the connection between the VM metrics and the application it relates to. For example, it may not be significant in itself that a VM is over-utilized (CPU usage over threshold), but it becomes important if the application it supports is impacted.

Example

Let’s assume a customer has a series of database VM servers that are used for various applications. The VM, Tier and Application assignments are illustrated below in the table.

VM Tier Application
orasrv1 DB WebApp1
orasrv2 DB CRMApp1
orasrv3 DB SvcDesk1

The application team has specified that the CPU Utilization for these VMs should not exceed 90% over three collection intervals (15 minutes). Therefore, our KPI metric is CPU Utilization %.

The KPI metric is assigned to all of the resources identified in the table above. Each VM has the KPI assigned to it. The DB Tier within each Application also has the KPI assigned to it. For example, the DB tier within the WebApp1 application is assigned a KPI for the orasrv1 VM. Finally, each Application also has the KPI assigned to it. For example, the WebApp1 application is assigned a KPI for the orasrv1 VM.

With these assignments, health scores for the VMs, Tiers and Applications will all be impacted when the CPU Utilization for the respective VM is over 90% for 15 minutes. Virtualization administrators can then accurately tell application administrators when their Application health is being impacted by a KPI metric.

Take-Away

When it comes to KPI alerts, there are 3 steps you can take to help “filter the noise” in vCOPs.

1)   Focus on a small number of metrics that truly impact infrastructure health.

2)   Define KPI metrics that will trigger the important alerts.

3)   Set up these KPI metrics consistently across infrastructure levels (eg VM, Application, DB), so that issues are not missed any particular level.

For future updates, follow @VMwareCloudOps on Twitter, and join the conversation by using the #CloudOps and #SDDC hashtags.

CloudOps at VMworld – Operations Transformation Track

VMworld, taking place August 25th through August 29th in San Francisco, is the virtualization and cloud computing event of the year.

The Operations Transformation track offers 21 sessions designed to share real-world lessons learned about the changing IT Operations landscape in the Cloud era. Self-service provisioning, automation, tenant operations, hybrid cloud and SDDC architectures are all optimized when operations change.

You can find out how to get the most out of the latest VMware technology by attending sessions focused on these operations transformation topics. Some of the sessions include:

OPT5414 – Automating, Optimizing and Measuring Service Provisioning in a Hybrid Cloud

David Crane, Cloud Operations Consulting Architect, on service provisioning and how automated provisioning can help reduce costs, improve flexibility and agility, speed time to market and improve ROI of cloud deployments.

For more on this topic, check out our Friday Reading List on Orchestration and Automation.

OPT5705 – Balancing Agility with Service Standardization: Easy to Say But Hard To Do

A panel of seasoned IT experts, including VMware’s VP IT operations discuss what does and doesn’t work with service standardization, where services can be tailored to meet unique needs, best practices for driving a common service definition process across a set of constituents.

For more on standardization, check out our Friday Reading List on Standardization in the Cloud Era.

OPT5051 – Key Lessons Learned from Deploying a Private Cloud Service Catalog

John Dixon of GreenPages Technology Solutions discusses lessons learned from a recent project deploying a private cloud service catalog for a financial services firm.

John Dixon was a co-host in our last #CloudOpsChat on Reaching Common Ground When Defining Services. Check out some of his insights in the recap blog.

OPT5569 – Leveraging Hybrid Cloud to Transform Enterprise IT from a Cost Center to a Revenue Driver

What if you could transform a “cost center” into a consultative center of customer innovation? Learn how you can leverage hybrid cloud to turn your “cost center” into a revenue driver with Jeffrey Ton, SVP Corporate Connectivity & CIO, Goodwill Industries of Central Indiana and John Qualls, Senior Vice President of Business Development, Bluelock.

For more on this topic, read our webinar recap blog on 5 key steps to effective IT operations in a Hybrid world.

OPT4732 – Leveraging IT Financial Transparency to Drive Transformation

Achieving financial transparency is fundamental to IT transformation. This session shows you how to leverage IT financial transparency to drive the transformation your business needs.

Read Khalid Hakim’s recent blog on Calculating Your Cloud Service Costs for more on this subject.

OPT4689 – Operations Transformation – Expanding the Value of Cloud Computing

A forcing function for change, cloud computing helps IT organizations move away from focusing only on siloed technology challenges. Phil Richards and Ed Hoppitt explain how to expand the value of cloud computing.

Ed Hoppitt is also a writer for the VMware CloudOps blog. Check out his work here.

OPT5215 – Organizing for Cloud Operations – Challenges and Lessons Learned

Addressing the organizational changes that must take place for IT to successfully operate a cloud environment and  provide hybrid-cloud services, as well as lessons learned from customers who have experienced this change.

Want to learn more? Check out Kevin Lees’ 3-part series on this topic – Part 1, Part 2, Part 3

OPT5489 – Pivot From Public Cloud to Private Cloud with vCloud and Puppet

Edward Newman and Mike Norris from EMC  explain how EMC has built a private cloud, pulled workloads back in from public cloud, and saved a bunch of money. Hard proof that private cloud is cheaper than public cloud.

OPT4963 – SDDC IT Operations Transformation: Multi-customer Lessons Learned

Technical account managers Valentin Hamburger and Bjoern Brundert of VMware share lessons learned from working with multiple customers, on how to overcome ancient, siloed IT processes and holistically enable your infrastructure to leverage an automated, policy-driven Datacenter.

OPT5697 – Symantec’s Real-World Experience with a VMware Software-Defined Data Center

Learn about the real-world experience of Symnatec’s IT organization who has deployed one of the world’s largest private clouds in a VMware-based Software-Defined Data Center.

OPT5474 – The Transformative Power and Business Case for Cloud Automation

Understand the terminology and the key success factors behind the concepts from two industry leading automation experts. Cut through the clutter and attend this session to learn from use cases that highlight the value of different types of automation, as well as proven approaches for building a business case for each.

Read this blog post by Kurt Milne for more information on task automation economics!

OPT5593 – Transforming IT to Community Cloud: A Canadian Federal Government Success Story

The story of Shared Services Canada, which scaled its private cloud to meet the needs of a community of 43 departments on a private vCloud deployment.

OPT5315 – Transform IT Into a Service Broker – Key Success Factors

The concept of an IT service broker is compelling. This session will explain key success factors in transforming IT into a service broker.

OPT5656 – VMware Customer Journey – Where are we with ITaaS and Ops Transformation in the Cloud Era

Kurt Milne Director of CloudOps at VMware, and Mike Hulme, Director of Enterprise Marketing at VMware, discuss where we are with ITaaS and Ops Transformation in the cloud era. Understand what your peers are doing that could benefit you, and learn what drives value across SMB, Commercial and Enterprise accounts on multiple continents.

Read more about how CloudOps represents a new way of managing IT in the Cloud Era.

OPT5194 – VMware Private Cloud – Operations Transformation

Venkat Gopalakrishnan, Director of IT at VMware offers operations transformation lessons learned from VMware’s own vCloud deployment. Ask the expert.  He has both VMware product and operations expertise.

We hope this guide will help you put together an unforgettable VMworld schedule!

Follow @VMwareCloudOps on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.

Rethinking IT for the Cloud, Pt. 1 – Calculating Your Cloud Service Costs

By: Khalid Hakim

So you’re the CIO of an organization and you’ve been asked to run IT like a business. What do you do?

You can start by seeing IT as a technology shop, with “services” displayed on its shelves. Each service is price-tagged, with specs printed on the back-tag. A service catalog is available for customers to pick up and request services from. Each service or set of services is managed by the “service manager/owner” role. Your IT shop would have an Income Statement (Profit-Loss P&L) and a Balance Sheet.

Think of it – in other words – as a business within a business: IT is just a smaller organization within the main business org. And where’s the value to you in that? Well, it’s because your boss is right: IT should be a business enabler, a revenue supporter, and a value creator. And because it helps you ditch your colleagues’ long-held impression that IT is nothing more than a revenue drain.

Next, you need to show exactly how your organization contributes to the success and profitability of the business. How can the CEO and CxOs further realize the value of the IT you’re supplying? How can you calculate the contribution of every dollar of investment in IT to their net income? These are just a few of the questions that you need to consider when positioning IT on a critical value path.

Cloud is Here to Help

As you look to transform IT from a passive order-taker to an IT service broker, or even to a strategic business partner, you’ll likely look to cloud computing for agility, reliability, and efficiency. Cloud can deliver all of these things, with stunning results. But this transformation cannot happen without a paradigm shift in how you operate and manage your technology.

Luckily, cloud computing embraces consumerization and commoditization and is a perfect fit for the IT shop/P&L model: Everything is expressed in terms of “services” and business value. If I could introduce a new English word, it would be “servicizing,” as in “servicizing your IT.” Part of this transformation means moving from C-bills to S-bills, that is, moving from Component level bills (e.g. an IT bill that has IT components) to Services bills, which are more clear and understandable. And to begin this process, you need to “servicize” your whole IT context.

There are multiple steps involved here, but for any of this to be worthwhile, what you do has to be justifiable within your new cost/benefit framework. So you need to start off with a true understanding of how much each and every service is costing your company.

In the remainder of this post, I’m going to suggest a few key points that will help you identify and calculate what each cloud service costs. Future blog posts in this series will address other important steps to IT transformation for the cloud, such as the importance of automating your IT cost transparency as well as a step-by-step guide to tagging costs as CAPEX and/or OPEX.

Calculating Cloud Service Costs

Step 0: Define your cloud service – I am calling this step zero because you first need to truly understand what makes up your cloud service before you can go any further. Service definition is a separate exercise and discipline whose foundations should be deeply rooted in your organization if you want to describe it as “service-oriented.” Defining a cloud service helps you see the boundaries of your service, as well as correctly understand and identify its components. And it solves one of your biggest service cost challenges, reducing the “unabsorbed costs” bucket by clearly identifying all cost components, including your service’s technology, processes and team.

Step 1: Identify direct and indirect fixed costs – With an accurate service definition, all components that contribute to your service delivery (technology, processes, and team) are now identified. This next step is to identify the direct costs that your drivers and elements contribute to your service. In addition, you’ll need to identify all indirect fixed cost drivers and apply the allocation percentage that has been agreed upon during the establishment of your service’s cost model. Your support contract is a common example of an indirect fixed cost: The cost of your support contract should be split over the number of products and calls, as previously detailed in your contract.

Step 2: Identify direct and indirect variable costs – Another challenge is dealing with your variable costs and how to allocate them to the services that depend on these costs. Much of this should have been defined in the service’s cost model, so you should apply those same policies on the identified variable-cost drivers and elements. Your monitoring tool is a great example of an indirect variable cost, as the costs need to be distributed over your fluctuating number of applications or services being monitored at any given time.

Step 3: Identify any unabsorbed costs – The “unabsorbed costs” bucket is a group of cost drivers and elements whose costs you cannot attribute to any particular service, meaning they must be attributed across all services. During the development of your service’s cost model, you need to decide how to deal with such costs. Typically, there will be a certain uplifting amount that needs to be added or allocated to each service. A good example of this would be the cost of labor (i.e. service managers) that should be distributed across all services.

Step 4: CAPEX/OPEX tag and adjust – There is no major decision-making in this step, as most of these Capex and Opex discussions should have taken place during the time you purchased your cloud service components. However, it is very important to tag each cost with a CAPEX or OPEX (or both in some cases) because that will eventually impact the way that you distribute and allocate operational or depreciated costs of each element.

Step 5: Finalize your service cost calculations – After identifying and defining all of your cost units (e.g. per User or Consumption: per GB) and metering options (e.g. hourly, weekly, monthly, etc.), finalize your service cost per cost unit calculations considering all the elements gathered in the previous steps I’ve just outlined.

In summary, when preparing your IT team for cloud computing, keep in mind the following:

  • Successfully implementing cloud computing in your company starts by changing the way you see IT (and making sure everyone on your team is aware and on-board as well).
  • It is essential to carefully and correctly define your cloud service and to keep in mind the cost model you established for your service as you do so.
  • Identifying the costs of your cloud service will let you illustrate the value of IT at your company and show how your cloud service positively impacts your business as a whole.
  • You can follow the 5-step process outlined above to ensure that you have fully identified your costs.

You may not be personally on the hook to figure all this out, but service owners/managers or someone in your IT department probably is.  So why not forward this post to folks you work with in IT, and suggest that they attend the IT Financial Management Association Conference in Savannah, Georgia next week. I’ll be hosting a workshop on Monday, July 8th at 8am on cloud IT service financial management, and on Wednesday, July 10th at 10am, I’ll be presenting an overview of cloud service financial management.

Stay tuned for the next post in this series, where I will discuss service definition in more detail. In the meantime, if you’re interested in reading more on the transformation of IT, check out these other posts:

Follow @VMwareCloudOps on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.

Automation – The Scripting, Orchestration, and Technology Love Triangle

By Andy Troup

In speaking with some of my customers, the message comes resoundingly across “WE WANT TO AUTOMATE.” So this is the sweet spot for cloud solutions as they have in-built automation to provide the defined benefits of cloud computing such as On-demand self service, Resource pooling and Rapid elasticity (as defined by NIST here).

However, upon scratching the surface and digging a little deeper, the other thing I’ve found is that when I’m told “yes we’ve got automation,” it typically means that a lot of effort has gone into developing a whole heap of scripts that have been written to solve a very specific problem. This, I would argue, is not the best way for automation to be achieved.

I was in conversation with a customer a few weeks ago where they wanted to automate a particular part of their provisioning process, and my recommendation to them was “DON’T DO IT.” Why did I say this? Well, the process was broken, inefficient, relied on spreadsheets & scripts and meant there was constant rework to have a satisfactorily provisioned system. Their provisioning process took weeks and weeks. There was no point in automating this broken process – what needed to happen was that the process had to be fixed or changed first. I won’t go into anymore detail about this particular problem, but my point is that sometimes you have to take a step back and see if there are other ways of solving a particular problem.

In summary – there’s no point in automating a broken process.

So, why do we want to automate our IT systems and the provisioning of them anyway? Primarily because we want two things:

  1. To take the boring repeatable activities that many IT administrators undertake and get a system to do it instead. This frees up time for the administrator to do the more interesting and difficult things.
  2. Remove the potential for errors. Anything that is done as a manual activity involving people is liable to be inconsistent and error-prone (I say liable, but really we all know that they will be inconsistent and error-prone). Cloud solutions are all based on the premise that everything is standardized, and thus we need to remove any activity that introduces unreliability.

OK, so we’ve now established that automation is a good thing. All we need to do now is work out HOW we’re going to automate, and this may introduce some difficult decisions.

So what are the automation options? Well, in my mind automation comes in three different flavours which should be used together to solve the automation challenge. Here they are with some definitions I found:

  1. Script – programs written for a special runtime environment that can interpret and automate the execution of tasks which could alternatively be executed one-by-one by a human operator. (http://en.wikipedia.org/wiki/Script_(computing))
  2. Orchestration – describes the automated arrangement, coordination, and management of complex computer systems, middleware, and services. (http://en.wikipedia.org/wiki/Orchestration_(computing))
  3. Policy – Policy-based management is an administrative approach that is used to simplify the management of a given endeavor by establishing policies to deal with situations that are likely to occur. (http://whatis.techtarget.com/definition/policy-based-management)

In terms of their use, the image below shows how I believe they should be used and in what quantities. As you can see, we should be aiming for as much policy implementation as possible with as little script as we can achieve.

If you have a process you’d like to automate, to find the solution, you should work up the pyramid from the bottom.

So the first question you should ask yourself is “can I create a policy or several policies to solve the problem?” This will have a dependency on the technology available to utilize the policy, but should be the first port of call. It may even be worth considering investing in the technology to make the policy implementation possible. The overhead of creating and maintaining policies are small and they will provide a robust solution to your problem with reliability and consistency.

If it isn’t possible to create a policy to solve the challenge, next consider orchestrating a solution. This will provide a reusable, standardized capability that has an element of management/maintenance overhead and will be reliable.

Finally, if neither policy nor orchestration will work for you, then utilize scripting as a last resort. Why a last resort? Scripting is a tactical, bespoke solution for a specific requirement and will require managing and maintenance during its entire life, which in turn will incur costs and will be less reliable.

So in summary, when you are considering automating a process:

  • Step back from the automation challenge and consider the options. They may not be what you expected.
  • Work up the “Love Triangle” from the bottom.
  • If you can’t implement a policy, consider orchestration and use scripting as a last resort.

For more great insight on automation, see our previous posts highlighting automation economics and IT automation roles.

Follow @VMwareCloudOps on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.