Home > Blogs > VMware Operations Transformation Services > Monthly Archives: May 2013

Monthly Archives: May 2013

The Illusion of Unlimited Capacity

By: Andy Troup 

I was at a customer workshop last week, and I used a phrase that I’ve used a few times to describe one of the crucial capabilities of a successful cloud computing service, namely “The Illusion of Unlimited Capacity.” It got a bit of a reaction, and people seemed to understand the concept quite easily. So apart from its sounding quite cool (maybe I should get out more), why do I keep on using this term?

Well, in cloud computing, we all know that there is no such thing as unlimited capacity – everything is finite. Every cloud provider only has a limited number of servers, a limited amount of storage capacity, and a limited number of virtual and physical network ports – you get the idea, it’s all limited, right?

Paradoxically, though, providers of cloud resources have to make sure their customers believe the opposite: that there is no end to what can be consumed.

The National Institute of Standards and Technology (NIST) defines one of the characteristics of cloud computing as on-demand self-service; i.e. the user can consume what they want, when they want it. Now, for cloud providers to provide on-demand self-service, they need to be confident that they can fulfill all the requests coming from all their consumers, immediately. They need to maintain, in other words, an illusion of unlimited capacity.

If at any point a consumer makes a request, and the cloud portal they use responds with a “NO” because it’s run out of cloud resources, this illusion has gone. That has real consequences. As it is very easy for consumers to move between cloud providers, it’s very likely that the provider will have lost them as customers and will find it very hard to get them back. Remember, even for internal IT cloud providers, it’s a competitive market place and the customer is king.

So, when defining your cloud strategy, you want to make sure that maintaining ‘the illusion of unlimited capacity’ is on your list. It may not be something you need to consider initially, but when demand for your services increases, you need to be ready to deal with the challenge. To prepare for it, here are 5 things you should start thinking about:

  • Understand your customers – build a strong relationship with your customers, understand their business plans, and use this information to understand the impact those plans will have on the demand for your cloud services.
  • Implement the appropriate tooling – so you can not only understand demand for your cloud capacity today, but also forecast future demand.
  • Consider the Hybrid Cloud – think about how you would burst services in and out of a hybrid cloud and when you would need to do it. Before you actually need to do this, make sure you plan, prepare and automate (where possible), so that everything is in place when it’s needed. Don’t wait until it’s too late.
  • Train users on service consumption etiquette – if they know they can get what they need when they need it, they will be less inclined to hoard resources. And if they aren’t hoarding resources, the pressure to predict their future demand (which can be difficult) will be reduced, because resources are being used more efficiently. Why not agree that they won’t have to plan capacity if they “turn it off” when done, thus freeing resources back to the pool and further increasing spare capacity.
  • Kill zombie workloads – be aware of services that aren’t being used and turn them off (after having a conversation with the customer). Also, encourage the use of leases for temporary services when appropriate.

Finally, going back to the essential characteristics of cloud computing as defined by the National Institute of Standards and Technology (NIST) (here is the very short document for those of you that haven’t read it), one other characteristic is rapid elasticity.

If you think about it, this article is really all about rapid elasticity. It’s just another way of saying that you need to maintain the illusion of unlimited capacity. Now, put on your top hat, hold on to your magic wand, and keep the illusion going.

For future updates, follow @VMwareCloudOps on Twitter and join the conversation using the #CloudOps and #SDDC hashtags.

What Do We Mean by IT Services in the Cloud Era?

By Kevin Lees

You hear it all the time from cloud evangelists: instead of delivering based on projects, IT should now be delivering around a common set of services.

It’s not a new idea—but cloud computing promises to finally make it a reality.

Before we get too excited, though, we should ask: what do we actually mean by cloud services? That’s not something cloud advocates always make clear.

So here’s an example:

The other week I was talking with a customer who runs a cloud that supports production dev test environments for a  government agency. These environments are in turn supporting mission-critical applications that play a major role in maintaining the public’s health.

From a service perspective, the tenant ops team is identifying and building a set of common development platforms as virtual applications. In this case each platform consists of three tiers, with each tier running a Windows operating system that’s been pre-built to meet government security policies. The composite platforms all have monitoring drivers already installed, and also feature commonly-used development environments – in this case they’re either a Microsoft dot-net type environment or Java-based.

Collectively, that creates a common virtual dev test vApp pre-built with a lot of the core capabilities and requirements to do this type of mission-critical application development. My customer’s team is then offering this multi-tier stack as a “service” via self-service on demand provisioning.

In the past, it could have taken two to three months to stand up something like this for a new round of development and testing. Now, with these prepackaged, common services, a new development environment can be deployed in less than an hour..

It’s a great example of how quickly you can provision, not only from infrastructure perspective, but so that developers don’t have to repeatedly start out with raw infrastructure and build-in all of their own environments.

This standardized, pre-packaged development environment can also be used across multiple development teams and even across multiple departments. Each may need to do some tweaking for their particular area, but it saves everyone an enormous amount of work.

For future updates, follow @VMwareCloudOps on Twitter and join the conversation using the #CloudOps and #SDDC hashtags.

Service Definition – The Tradeoff Between Standardization and Agility

By Rohan Kalra and Pierre Moncassin

In the client server era, IT demonstrated responsiveness by designing infrastructure to meet the technical requirements of various applications that the business relied on to do work. Developers spec’d systems. Ops built the systems. Devs changes the specs. The back and forth continued until the systems were live in production.

There were attempts to enforce architecture standards that were designed to control the chaos of having every system be a unique work of art, but business needs for whatever typically trumped IT needs for simplicity. If developers for a critical business application demanded some unique middleware configuration, they usually got what they requested.

As a result, most IT organizations have racks full of one-off systems that are unique and often hard to support.  “A museum of past technology decisions” is one way to describe the typical enterprise datacenter landscape.

Cloud Changes Everything

Cloud computing changes this paradigm. With cloud, developers and users experience the value of fast access to standardized commodity compute resources. By accepting and designing around standard resource configurations, developers no longer need to predict usage levels to set capacity requirements, and no longer have to wait through long procurement cycles.  Similarly, by accepting one-size-fits-all, consumers can get immediate access a wide range of ready to use apps.

The trade-off IT consumers make is essentially one of releasing control over technical assets in order to gain control over business processes. In return for accepting increased standardization (typically at the ‘nuts and bolts’ level, e.g. infrastructure, catalog, OLA’s, charging models), they get unprecedented agility at the business level (“on-demand” IT both in the form of provisioning and scaling and usage levels change).

In the cloud era, IT demonstrates responsiveness by giving developers and users immediate access to standard IT services accessed and then scaled on demand.

As a result, IT success in the cloud era depends, to a large extent, on IT consumers to understand the tradeoff and appreciate the value of standardization.

Start with Common Service Definition

The first step to achieving standardization is getting agreement on a common service definition. This includes getting multiple groups that traditionally have requested and received custom work, to agree on the details of standard services. There is an art in building this consensus, as different consumers with unique requirements need to come together to make this a success.The key is communication and consistency starting for from collection of requirements to delivery of services. (more on this process in a future blog post)

Another critical step is standardizing and centralizing an organization’s service catalog and portal. This allows for a consistent and secure customer experience that provides access across all services regardless of underlying environment – physical, virtual, as well and private and public cloud resources.

Standardization also enables IT to be a true service broker, picking the right environment to meet the needs of each service or workload. A service broker strategy includes policy-based governance, service-based costing, and end-to-end life cycle management across all types of internal and external services.

Today, organizations that understand the need for standardization are the ones transforming themselves to be more responsive with cloud-based operating models. For them, standardization is the driver to both increase business agility, and become more efficient from an OPEX perspective.

Key actions you can take:

1. Acknowledge the problem.

Is this true within your organization?

  • Multiple single points of failure?
  • Specific individual’s supporting legacy applications without documented runbooks or recovery procedures?
  • Continuous fire-fights due to complex architectures leading to business downtime?
  • Inefficient manual procedures?
  • War room like setups to solve problems with limited to no root cause analysis and problem solving measures for the future.

2. Before embarking on the journey, take stock candidly of what is actually being delivered today. Ask probing questions from your current-state services.

  • What services levels are actually being delivered (not just promised ‘on paper’)
  • What services look ‘gold plated’ and could be simplified?
  • What services are never, or very occasionally used?

Once you have a firm baseline, you are ready to start the journey.

3. Understand it’s a journey and it takes time. There is no big bang answer to solving this problem.

  • Start with small wins within your organization’s cloud transformation.
  • Development environments are ideal proving grounds.
  • Initialize the cloud first policy.

4. Create a cloud strategy and focus on building business consensus through business communication and outreach.

For more on this topic, join Khalid Hakim with John Dixon of Greenpages for the May 30th #CloudOpsChat on Reaching Common Ground When Defining Services!

For future updates, follow us on Twitter at @VMwareCloudOps and join the conversation by using the #CloudOps and #SDDC hashtags.

Reaching Common Ground When Defining Services – Join Us For #CloudOpsChat!

An optimized service definition process can make or break the success of hybrid clouds or Software-Defined Data Centers (SDDC). But even if you have tools and processes in place to enable automated provisioning, you still need key stakeholder agreement on the makeup of standard services and resource configurations.

  • Standardized services need to meet the needs of those who request and consume the services.  But they also need to make sense to those in IT responsible for both automation that delivers the services and ongoing support.
  • Standardization helps increase flexibility at the business process level. But rigid service definitions can also inhibit those who both consume and deliver the services.

So how can you meet the needs of multiple groups and find common ground when it comes to defining services?

Find out by joining our next #CloudOpsChat on “Reaching Common Ground When Defining Services” taking place on Thursday, May 30th at 11am PT.

The event will be co-hosted by two CloudOps pros who have helped multiple VMware customers reach common ground:

  • John Dixon, Consulting Architect at GreenPages (see John’s posts on GreenPages Journey to the Cloud blog)
  • Khalid Hakim, Cloud Operations Architect at VMware

During the chat, we will answer the tough questions:

  • What service components should be included in a standard service definition?
  • What components can be flexible for modification around the edges?
  • Are there obvious points of abstraction that help balance standardization and flexibility?
  • Are there recommended approaches to getting multiple groups of users to reach consensus?
  • Are there recommended approaches to balancing the needs of both IT and service consumers?
  • What happens if key stakeholders don’t reach consensus?

Here’s how to participate in #CloudOpsChat:

  • Follow the #CloudOpsChat hashtag (via TweetChatTweetGrid, TweetDeck, or another Twitter client) and watch the real-time stream.
  • On Thursday, May 30th at 11am, @VMwareCloudOps will pose a few questions using the #CloudOpsChat hashtag to get the conversation rolling.
  • Tag your tweets with the #CloudOpsChat hashtag. @reply other participants and react to their questions, comments, thoughts via #CloudOpsChat. Engage with each other!
  • #CloudOpsChat should last about an hour.

In the meantime, feel free to tweet at us at @VMwareCloudOps with any questions you may have. We look forward to seeing you in the stream!

The Lowly Metric Has Its Day in the Sun

By Rich Benoit

Back in the day, I would have killed for a tool like vCOps, an analytics tool that uses dynamic thresholds to make sense of the myriad activity metrics that exist in an IT environment. Without dynamic thresholds that identify normal behavior, admins like myself are forced to use static thresholds that never seemed to work quite right. Static thresholds tended either to be set too low, resulting in false positives, or too high, so that by the time they were tripped, the support desk had already started receiving calls from disgruntled users.

Tried, but Failed

  • One approach I tried in order make sense of the cloud of data coming from multiple monitoring tools was to combine several metrics to get a more holistic view. Combined metrics also rely on static thresholds and are similarly plagued with false positives. But, they introduce the additional problem of having to try and figure out which of the underlying metrics actually caused the alarm to trip.
  • Another approach I tried was using end-user experience monitoring, or end-to-end application monitoring. Instead of trying to estimate the performance of an application by looking at the sum of all of its components, I could instead look at the simulated response time for the typical user and transaction. Another end-to-end monitoring tactic was to employ passive application sniffers that would record the response time of transactions. But with both approaches, I was still dependent on static hard thresholds that were invariably exceeded on a regular basis. For example, it wouldn’t be unusual for an application to exceed its 2-second response time goal during regular periods of peak usage. So I had to know when it was normal to exceed the allowed threshold.  In other words, I had to know when to ignore the alarms.
  • Static thresholds also impacted performance monitoring. Other admins would ask, “Did this just start?” or “Is the performance issue the result of a change in the environment?” The monitoring tools wouldn’t provide the needed data. So we would have to roll up our sleeves and try to figure out what happened. Meanwhile the system would be down or just struggling along. Many times the problem would go away after a certain amount of time or after a reboot, only to resurface another day.

In the end, except for a few cases, we just turned off the monitors and alarms.

A Better Approach

That is why I would have killed for vCOps. vCenter Operations Management Suite is built on an open and extensible platform that works with physical and virtual machines.  It is a single solution works with a variety of hypervisors and fits either on-premise or public cloud environments.

It collects and stores metrics over time and works behind the scenes to establish dynamic thresholds. It employs around 18 different algorithms that compete to best fit any one of the millions of metrics it can track. Some algorithms are based on time intervals and others on mathematical models.

With vCops I can now designate specific metrics as KPIs for additional granularity. For example, the tool would learn that it is normal for response times to be in the 2 to 4 second range on Monday mornings, but if it exceeds the normal range, above or below, I can now have a KPI Smart Alert generated.

Another thing that I can use is the Early Warning Smart Alert that detects change in the environment when too many anomalies occur, such as when too many metrics are outside their normal operating range. I can use the various dashboards and detail screens to view the metrics over time, so that instead of wondering whether the issue is the result of a capacity trend or something changing / breaking, I can look and quickly see, “Oh, there’s the problem. Something happened at 1:15 on system X that caused this service to really slow down.”

Now, after more than 20 years in IT, I can finally start to use the multitude of metrics that have been there just waiting to be leveraged.

To get the most out of monitoring tools consider using vCops range of capabilities, including:

  • The ability to track KPIs within the infrastructure, such as Disk I/O or CPU Ready, or leverage the vSphere UI so that you know if your infrastructure has additional capacity or not.
  • Various KPI Super Metrics within the application stack (e.g. cache hit rate or available memory) that alert you when things are outside of a normal range.
  • The power to see exactly how an environment is performing on a given day, and the ability to isolate which component is the source of the issue.
  • The means to track and report the relative health of not only your components, but your services as well, without having to view everything as up or down at the component level and guess if the application or service is OK.

And it’s all possible because we can now actually use the lowly metric.

For future updates, follow @VMwareCloudOps on Twitter and join the conversation using the #CloudOps and #SDDC hashtags.

Refresher Course in Automation Economics

It’s a key question in developing a private or hybrid cloud strategy: “What processes should we automate?”

There are plenty of candidates: provisioning; resource scaling; workload movement. And what about automating responses to event storms? Incidents? Performance issues? Disaster recovery?

To answer the question, though, you need to first establish what you’re looking to gain through automation. There are two basic strategic approaches to automation, each with specific value propositions:

  • task automation – where the proposition is more, better, faster
  • service automation – where you’re looking to standardize and scale

In my last post, I looked at how the automation strategy determines your HR needs.

In this post, I’ll highlight a simple economic model that can be used to cost justify task automation decisions. Next time, I’ll refine the math to help analyze decisions about what to automate when pursuing a service automation strategy.

The Cost Justification for Task Automation – the Tipping Point

From a cost perspective, it makes sense to automate IT tasks if:

  • the execution of the automated task has a lower cost than the execution of a manual version of the task.
  • the automated process can be run a large number of times to spread the cost of development, testing, and ongoing maintenance of the automation capability.

Brown and Hellersten at the IBM Thomas Watson Research Center expressed the idea in a simple model.[1] It compares the fixed and variable costs of manual process versus automated version of the same process. The cost calculation is based on the variable N, which represents the number of times the automated process will execute.

IT organizations typically automate existing manual processes. So we consider the fixed cost of developing the manual process as part of the automated process costs.

With these two equations, we can solve for an automation tipping point Nt. Nt, then, is the number of times a process is executed at which it becomes cost effective to automate the process.

Changing the task automation tipping point

Now, what actions could we take that would shift the tipping point? We might:

1. Reduce automation fixed costs. If we can drive down automation fixed costs, automation becomes economically attractive at lower number of process executions.

Automation fixed costs include purchasing and maintaining the automation platform, as well as standardizing process inputs, ensuring the process is repeatable, developing policies, coding automation workflow based on those policies, testing each automation workflow, documenting error and establishing exception handling procedures. We also need to add in ongoing maintenance and management of automation routines that may change as IT processes evolve. If any of this work can become highly standardized, Nt will be lower, which will in turn increase the scope of what can be further automated.

2. Minimize automation variable costs. Reducing automation variable costs also makes automation attractive at lower number of executions.

Variable costs include both the cost of each automation execution and the cost of managing exceptions that typically are triaged via manual resolution processes. With a very large number of process executions, the variable cost of each incremental automated process execution would essentially be zero except for costs related to handling exceptions such as errors and process failures. Standardizing infrastructure and components configurations, and thus management processes, reduces exceptions and lowers the tipping point.

3. Pick the right tasks.  Automating manual processes with high cost of execution is an obvious win. The slower and harder the manual task, the higher the cost of each execution, and the lower the tipping point for automating the process.

Benefits other than cost reduction

Automation offers benefits beyond cost reduction, of course. In the cloud era, demand for agility and service quality are also driving changes in the delivery and consumption of IT services.

Automation for agility 

Agility is key when it comes to quickly provisioning a development or a test environment, rolling it into production, avoiding the need for spec hardware, accelerating time to market and reducing non-development work. Typically, 10-15% of total development team effort is spent just configuring the development environment and its attendant resources. Automation can make big inroads here. Note, too, that agility and speed-to-market factors, which generally have a revenue-related value driver, typically aren’t included in task automation tipping point calculations.

Automation for service quality

Automation promises greater consistency of execution and reduced human error, quality-related benefits that also aren’t factored in the calculations above. Downtime has a cost, after all. Deploying people with different skills and variable (and often ad hoc) work procedures at different datacenter facilities, for example, directly impacts service quality. Automated work procedures reduce both human error and downtime.

Back to the math

Really, we should add the quality-related costs of error and inconsistency to our manual variable processes costs, since they mirror how automation error recovery costs are calculated.

To account for the manual process quality costs, the tipping point calculation could replace “Manual variable costs” with “(Manual variable costs + Manual quality costs)” in the denominator.

Doing that would further lower tipping point number that justifies automation.

Here’s how I sum up these concepts applied to task automation environment:

  • If a manual task is easy, it is difficult to justify automating it because the tipping point number will be very high or never reached
  • If a manual process is hard and error prone, it is easy to justify automation i.e. Nt is a low number
  • If there are a lot of process exceptions that result in a large percentage of process executions that result in a manual intervention – it makes it harder to justify automation
  • If automation routines are hard to program, or take a lot of time and effort to tweak and maintain over time due to ad hoc run book procedures – it makes it harder to justify automation

In the next post, I’ll explore the economic justifications for automation under a service automation strategy.

Follow @VMwareCloudOps for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags.


[1] Reducing the cost of IT Operations – Is automation always the answer? IBM Thomas J. Watson Research Center.  Proceedings of the 10th conference on Hot Topics in Operating Systems, June 12-15, 2005, Santa Fe, NM

The Changing Role of the IT Admin – Highlights from #CloudOpsChat

Last Thursday, we hosted our inaugural #CloudOpsChat on “The Changing Role of the IT Admin.” Special thanks to everyone who participated for making it an informational and thought-provoking conversation. We also wanted to thank Nigel Kersten (@NigelKersten) and Andrea Mauro (@Andrea_Mauro) for co-hosting the chat with us.

We kick-started #CloudOpsChat with the question, “Is increasing automation and virtualization good or bad for your career?”

Our co-host @Andrea_Mauro was the first to answer, making the point that IT is always evolving and you can’t realistically stay static in knowledge and skills. @KurtMilne agreed with Andrea, adding that more standardization and automation will help to foster the Industrial IT era and move away from the “artisanal” IT era, which is good for IT careers. Co-host @NigelKersten emphasized that IT needs to automate or prepare to be in an evolutionary dead-end in ops roles, adding that the business demands of today are too great not to do so. @andrewsmhay echoed Nigel’s thoughts, saying that the increase in automation and virtualization is good, taking a “survival of the fittest” standpoint – IT needs to evolve or perish. @ckulchar added to both Kurt and Andrea’s points, noting that IT needs to shift the focus to enabling app teams to effectively use cloud and not just port existing apps. @jakerobinson also joined the conversation, saying that increasing automation and virtualization is necessary in order to balance IT cost with capability.

With the discussion in full swing, we took to our next question: “How exactly does increasing automation change your job?”

@NigelKersten stated that increasing automation changes many roles, not just IT operations. @KurtMilne chipped in as well, saying that an increase in automation frees up your time to work on things that really matter, providing more value to your business. @jakerobinson had a similar opinion, explaining that automation eliminates human error, which means less unplanned work that he would have to take care of at a later time. @randwacker added that automation also allows businesses to move faster and be more innovative, which is a key value of Infrastructure-as-a-Service and cloud. @lamw offered a great analogy in answering this question, saying that not automating your infrastructure is like ignoring the existence of the assembly line in manufacturing.

We then asked our audience, “Do you think abstraction and better tools decrease the need for deep expertise?”

@DuncanYB thought so, but also added that abstraction does not result in a decrease of deep expertise, as you still need to build a strong foundation. @randwacker agreed with Duncan, as long as the tools package expertise with it. @KurtMilne added that automation and abstraction will definitely reduce the need for everyone to read 2-inch thick manuals. He made a point to say that someone will still need to read the manual in order to set up the automation, but from there others will be able to use the automation without the reading. @wholmes noted that deep expertise is needed in the development lifecycle of a solution, regardless of abstraction or not. He added that abstractions lessen the need for deep expertise in the operational phase of a solution. Both @NigelKersten and @KurtMilne agreed with @wholmes, saying that automation pushes expertise earlier in the service lifecycle.

Next, we asked our participants, “Do you think today’s cloud administrators need programming skills?”

@randwacker answered yes – cloud admins do need programming skills, but that’s quickly getting packaged. @DuncanYB hoped that they would not need programming skills, as he thought scripting was already difficult enough as it is. @NigelKersten pointed out to Duncan that programming could be easier than scripting, as better tools and interfaces make it easier to use the work of others. @jakerobinson said that cloud admins definitely need software development skills – from consuming APIs, as well as understanding agile methods. @ckulchar agreed, and added if cloud admins don’t learn the fundamentals of development, developers will learn cloud admins’ skills, resulting in a need to differentiate themselves. @wholmes said he hoped that cloud admins wouldn’t be required to have programming skills, but it all depends on the cloud.

From there, we asked participants, “Is PowerCLI better than your average scripting language?”

Both @lamw and @wholmes had similar viewpoints, saying that it may or may not be better, but that it depends on the background, which our co-host @Andrea_Mauro agreed with. @lamw chipped in that you have to use the right tool for the right job, and that the key is: if there is an API, you can automate it using a variety of tools – an idea that both @virtualirfan and @jakerobinson supported.

Staying with tools we then asked: “What are the advantages of managing compute, storage and network resources from a single tool?” 

Our co-host @Andrea_Mauro answered that one of the main advantages would be having complete control of all the resources. @NigelKersten added that network/storage configuration being attached to services allows for easier workload migration. @KurtMilne asked if it is reasonable to expect a single admin to effectively manage compute, storage and network, to which @wholmes said yes, but only to provision. If it were end-to-end, it would not be reasonable. However, @kix1979 said that in the current IT environment, no single tool can manage compute, storage and network resources.

We concluded our discussion by asking, “What do you think is the one skill all IT admins should learn this quarter?”

@lamw offered a short and sweet answer: Automation. @maishsk said that IT admins should learn Puppet/Chef, or even both. Co-host @Andrea_Mauro echoed William’s sentiment by saying that they should learn automation with good framework, which @wholmes and @KurtMilne agreed with. Both @kix1979 and @jakerobinson believed it would be important for IT to learn the business value and costs of running IT/services.

Thanks again to everybody who listened or participated in our #CloudOpsChat, and stay tuned details around our next #CloudOpsChat! Feel free to tweet us at @VMwareCloudOps with any questions or feedback, and join the conversation by using the #CloudOps and #SDDC hashtags.

The Secret to Avoiding the Portfolio Management Bottleneck: Simplicity

By: David Crane  

Delivering a set of standardized infrastructure services is a critical dependency as IT becomes more service oriented. Getting application owners who are used to custom infrastructure to agree to only use standard service configurations may be the defining problem of the cloud era.

The lifecycle of defining new service elements, adding them to the service portfolio, then formally releasing them for use by adding to service catalog is the very heart of the problem when getting multiple developers and application owners to agree to use standard services.

The process is critical.  And the process must be streamlined and oriented to the needs of users and funders of the service, and not the internal machinations of the IT organization.

However, traditional ITSM Service Portfolio Management is a cumbersome process geared to the needs of the IT organization.  IT includes numerous points of IT management sign-off, and the process is not optimized for actually developing and releasing new services into use.  The traditional approach tends to be heavy on oversight, and light on actually doing work. This approach reduces agility and wastes scarce resources. Not good in an era where increased agility and reduced operating costs are key measures of success.

Things are different within a virtual cloud ecosystem like VMware’s vCloud Automation Center (vCAC). With vCloud Automation Center, authorized users can access standardized services through a secure self-service portal, as vCAC acts as a service governor to help enforce business and IT policies throughout the service lifecycle. In this environment, a radically simplified design lets IT service managers focus their energy on the needs of users and funders and helps them get their work done with minimal internal IT process overhead and friction.

vCAC simplifies portfolio management in two main ways:

  • Policy-based service definition – Through vCloud Automation Center, users can request and manage their compute resources within established operational policies – cutting IT service delivery times. Users can build specifications into vCAC that contain the automation policies that specify the inputs needed and actions required to maintain your portfolio.
  • Improved service transition – Moving a new service out of the portfolio and into the catalog where it can be used requires keeping the portfolio and catalog elements up-to-date and aligned with each other. With vCAC, release and ongoing management functions are built into the tool set, and thus both automated and massively simplified.

One way to think of what’s changed here is in terms of oversight versus enablement. Traditional ITSM can be geared as much as 80% towards oversight, with just a 20% focus on the people who actually go and do the work. The vCAC approach flips that around.

Oversight is still essential, and it’s built in to the new model.  Prior to vCAC, traditional ITSM involved significant initial investment, top heavy input requirements, with repetitive multiple touch points to senior management., vCAC presents fewer, better-designed gates to your workflow, so you can work both safely and fast while gaining the agility that comes with a true cloud environment.

It’s About Standardization

The key to giving cloud consumers the services they want as quickly as possible, while still keeping the necessary corporate controls in place, is standardization.

Under vCAC’s blueprint model, service elements (e.g. backup, capacity, and provisioning requirements, security and other policies etc.) are preapproved to sit in the catalog and are thus ready to be deployed in new ways whenever they’re needed. In other words, if an item is in the catalog, and you have authority to access it, then you can provision at will – without having to go up the chain of command every time you want to respond to customer demand.

The result:

  • Fast efficiency processes focused on quickly and efficiently delivering new services to users, so that users don’t feed the internal IT machine.
  • Simplified processes with policy-based service definition capability and improved service transition, business agility and time to market.
  • Automated interfaces between the service portfolio and service catalog, with minimal resources and overhead required.

And you do it with higher quality, and at scale. With a set of preapproved blueprints and policies, it’s much easier to address increases in either the volume or variety of demand that you want to meet, and do it in a way that is more deterministic and improves service quality over time.

What’s more, you’ve done all that while reducing your company’s overhead and the resources you need to draw on.

With the help of vCAC, your portfolio management is simpler, more agile, more efficient and faster-to-market, too.

This is the first in a series of posts we’ll be writing about service portfolio management in a vCloud ecosystem. Next up, we’ll go deeper in to the simplified, three-step process of vCloud portfolio management.

Be sure to follow @VMwareCloudOps for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags.

VMware #CloudOps Friday Reading List – Standardization in the Cloud Era

I’ve been reviewing submissions for the Ops Transformation track at VMworld 2013.  It is a fascinating look at what a bunch of really smart people think is important in the cloud era.  Based on review of proposed panel discussions and breakout sessions, there seems to be some consensus that standardization is a key dependency for successfully deploying an automated and scalable service strategy.

The quantity and variety of topics suggests there isn’t yet consensus on how the concept of standardization should be applied. But some of the submitted topics suggest that standardization of service definitions and infrastructure configurations is what makes innovation possible at the business process level – where it counts.

Related reading topics:

Monitoring Strategies in the Cloud by Michael Kavis
Michael Kavis takes a look at best practices when dealing with cloud, including standardizing as much as possible in cloud based systems so that a high level of automation can be put in place.

What Goes Around Comes Around Part 2: Is Standardization Still a Valid Strategy? By Theo Priestley
Standardizing business processes reduces innovation. Note – VMware paper submissions suggest that standardizing IT services and infrastructure enable greater business process innovation.

Resilience Engineering Part 1 and Part 2 By John Allspaw
Great insights on how resiliency, automation, and standardization are all tightly linked.

Follow us on Twitter at @VMwareCloudOps for future updates, and join the conversation using the #CloudOps and #SDDC hashtags.