Author Archives: CloudOps Team

How to Manage Your Cloud: Lessons and Best Practices Direct from CloudOps Experts

Rich Benoit, a Consulting Architect at VMware, and Kurt Milne, VMware’s Director of CloudOps Marketing, are experts when it comes to managing cloud infrastructures. But they didn’t acquire their expertise overnight. When it comes to cloud management, the process of transitioning can take time and leave even seasoned IT pros scratching their heads, asking, “What should I do first? How do I get started?”

Join Rich and Kurt this Thursday, December 12 at 10am PT as they share the fruits of their experience as cloud managers. This webinar will dive into tangible changes that organizations need to make to be cloud-ready, including how to:

• Introduce new, specialized roles into the equation
• Improve event, incident, and problem management processes
• Establish analytics to provide visibility into the cloud

Wondering what to do and how to get started with your cloud infrastructure? Register now to save your spot!

We’ll also be live-tweeting the event via @VMwareCloudOps – follow us for updates. Also join the conversation by using the #CloudOps and #SDDC hashtags. We look forward to seeing you there!

Shifting from ‘Free Services for All’ to ‘Gold/Silver/Bronze Service Bands’

The art and science of leveraging VMware’s IT Business Management solution to manage demand

By: Pierre Moncassin

Here is a real-life customer challenge that I encountered at a workshop with a global pharmaceutical company. The challenge boils down to the question: How do you use tiered service offerings to manage demand in a culture where users are used to receiving only ‘gold plated’ services?

The team I met with is a central IT function delivering centralized, cloud-type services to multiple lines of business distributed globally. Each line of business tends to provision its virtual infrastructure independently, based on project-specific requirements. Many projects are business critical (or at least linked to substantial revenues), so teams tend to ask for the highest service levels offered without really thinking about lower service level alternatives. In absence of a mechanism to charge business units for their consumption, teams opt for the ‘gain’ of highest service offerings.

In the past, the IT organization tried to temper demand by standardizing its offering on a few median-specification offerings while requesting more justifications for the high-specification services. This approach encountered some success, and I believe it was going in the right direction because it shared the “pain” of higher IT costs.

However, in the absence of a use-based-on-consumption cost allocation method, their users still prefer high-end, or non-standard, configurations with a higher internal cost. And they will keep doing this as long as they experience all gain and no pain.

The solution is “simple” in principle: The IT organization needs to “share the pain” and cross-charge users according to chosen usage and service levels. But ‘simple,’ of course, does not mean ‘easy.’

Given the complexity of the effort and the potential pitfalls it can encounter, let’s break down the process for getting cross charging into place into three discrete steps:

Step 1 – Start with the Essential Tools: Deploy VMware’s Chargeback Manager

Okay. I said it. You need a new tool.  Spreadsheets just don’t work over the long term. VMware Chargeback Manager enables accurate metering of the cloud-based resources being used. Beyond that, it offers pre-defined cost models that make it easier for consumers to be billed according to their usage (with a range of allocation methods).

In addition, Chargeback Manager establishes a stepping stone to VMware’s IT Business Management – the comprehensive solution for managing IT budgets for the cloud. But, for now, let’s assume that you have introduced Chargeback Manager just so that each resource has a cost and an associated ‘bill of IT’ to the consumer (whether internal or external). Is that enough in itself to manage demand?

In my view, there is an implicit, but critical, additional assumption when presenting consumers with the ‘bill of IT:’ that the bill will show items that the consumers will a) clearly understand, and b) appreciate for their quality.

To compare, for example, with a restaurant service, clients would not normally expect an itemized bill showing a breakdown of heating, water rates, and raw ingredients (meat, vegetables) measured by weight. They expect to be charged by dish. But beyond accepting the costing model, they also have an implicit quality assumption. They expect that:

  1. Dishes must meet a minimum quality standard, or they will simply leave
  2. There is some link between the price offered and the quality of the dish

If these tacit assumptions about price/quality are not met, chances are that the customers will never come back (or in the case of a private cloud, move on to a public cloud provider).

Step 2 – Offer Tiered Service Categories

Similarly, charging for internal cloud-based services also needs both a well-defined ‘menu’ (i.e. a catalog of services) and a clear relationship between services and price.

In earlier blogs, we commented about the importance of standardizing cloud services and the trade-offs that this implies.

Standardization is a pre-requisite for charging, as this defines the ‘menu’ of services offered. It also underpins economies of scale and automation – which make the costs of cloud services attractive in the first place.

However, the introduction of a ‘price tag’ for services means far more than an accounting figure. It means a cultural change – introducing a buyer/seller relationship between the IT organization and the business units. The natural response for the ‘buyer’ side is to focus on price. This is why the IT organization needs to respond – like any experienced retailer would – with a focus on both price and quality of service.

A popular way to communicate service quality is to offer tiered service categories (e.g. ‘bronze/silver/gold’ with associated price bands).  These price/quality levels can be then published in the service catalog.

Step 3 – Facilitate Cultural Change Through Communication

Introducing tiered service categories will have ripple effects throughout the IT organization and beyond, as it will foster – and perhaps enforce – a service-oriented attitude. Externally, it will shift the image of the IT organization from a cost center to that of a commercially-focused service broker.

It’s a cultural change that can’t be left to chance, however – so I highly recommended that this should be supported by a communication management plan.

In an earlier blog, I shared some ideas for making your communication plan as effective as possible. But I want to emphasize a couple here that can help speed your move to a tiered service approach – and mindset.

Internally to the IT Organization, we want to empower the Tenant Operations teams not just to deliver the best possible service (as a matter of course), but also to manage end-user expectations. In practice, a workshop such a Cloud Operations’ Service Definition Process Optimization can go a long way in helping these internal IT teams crystalize their thinking around what to promise their users by way of IT services: setting the foundations for clearly-understood, two-way agreements between themselves and their ‘consumers.’

When it comes to communication with the end-users (consumers), I’d recommend thinking in terms of a sales campaign (whether pitched within the same company or not) that emphasizes both short term and long term ‘wins’:

  • Short-term ‘wins,’ such as increased control over their provisioning spend, and clearly-defined service quality underpinned by written service levels
  • Long-term ‘wins,’ such as being more closely involved in the service definition process – i.e. having more say in how services will be adapted to their evolving needs.

The Way Forward: Service Differentiation Beyond ‘Gold/Silver/Bronze’

The global pharmaceutical company I was working with is already embracing many of these concepts, and has introduced a tiered service model at the infrastructure component level (e.g. server, storage). That, in turn, has paved the way for further service development: once the tiered service approach has full adoption, the next step will be to expand the model towards the application level – offering numerous opportunities to add further value for consumers. The tiering model can be extended even further, to more complex application-related services, such as continuity management, database services, etc.

This ‘win-win’ perspective is at the core of the Cloud Operation’s approach to managing demand: it is not about taking capacity away from end users – instead, it’s about offering a more informed range of choices with a clear trade-off between cost and quality. Given the right choices, even the most vocal consumers will rethink picking the ‘gold-plated’ option every time.

Summary – Key Steps to Manage Demand for Cloud-Based Services:

  • Introduce Chargeback Manager to establish a robust foundation for service costing
  • Link service quality to service prices – think like a retailer
  • Offer a simple, tiered range of choice to consumers
  • Differentiate the end-to-end services, not the service components
  • Facilitate cultural change with a communication plan (internal and external)

Follow @VMwareCloudOps and @Moncassin on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.

Task Automation Vs. Process Automation – Highlights from #CloudOpsChat

After a successful automation-themed #CloudOpsChat in September, we decided to take a deeper dive into automation for this month’s edition, discussing “Task Automation Vs. Process Automation.” Thanks to everyone who participated, and thank you especially to Rich Pleasants (@CloudOpsVoice), Business Solutions Architect and Operations Lead for Accelerate Advisory Services at VMware, for co-hosting!

To begin the chat, we asked: “What IT tasks or processes has your company successfully automated?”

@Andrea_Mauro jumped right in, asking how automation compares to tasks? @kurtmilne offered VMware’s take, saying “VMware IT has fully automated provisioning of complex workloads on private cloud,” and clarified that the most complex workloads were “Oracle ERP with web portals, and over 80 blueprints.” @venkatgvm also elaborated on VMware’s automation story: “VMware instance provisioning had over 20 major steps, each of them were executed by siloed teams.”

Co-host @CloudOpsVoice took the question further, asking, “Are people automating day to day maintenance activities or actual steps in the process?”

@vHamburger gave his advice on where to begin with automation, saying “[day-to-day automation] is a good starting point. Nominate your top 10 time-consuming tasks for automation.” @Andrea_Mauro replied, suggesting that “task automation is more for repeatable operations and day by day [tasks].” He followed up by offering a definition of process automation: “Process automation could be more related to organization level and blueprint usage.” @kurtmilne also chimed in with business-related definitions of task and process automation: “Task automation math includes cost/time of single task vs. developing automation capability…Process automation math includes business benefit of overall improved agility, service quality – as well as cost.” @CloudOpsVoice broke his definition of automation into three parts: “day-to-day, build and run.”

@CloudOpsVoice next asked, “What technique do you use primarily for automation? Policy, orchestration or scripting? How do App blueprints impact it?”

@kurtmilne noted the value of blueprints and scripting: “Blueprints and scripting allow app provisioning automation – not just VM provisioning.” @thinkingvirtual also offered sound advice on how to select what to automate at your company: “Always make sure your automation efforts provide real value. Don’t automate for automation’s sake.” Elaborating on this, @kurtmilne discussed the value of automation, stating that automation’s “real value” is “ideally measured in business outcomes, and not IT efficiency.” @vHamburger also warned against bottlenecks preventing automation: “every enhancement after your bottleneck is not efficient – know your bottlenecks!”

@vHamburger went on to mention task workflow: “Clean task workflow with documented steps is always preferred over scripts,” he suggested, because it’s “easier and repeatable for new admins.” @Andrea_Mauro countered by saying that sometimes a “‘quick and dirty’ solution could be good enough,” to which @vHamburger replied, “In my experience ‘quick and dirty’ always leads to fire fighting ;).” @kurtmilne then vouched for “leaning out” a process: “‘Leaning out’ an IT process is good. But sometimes it’s better to use automation to eliminate tasks vs. automate tasks,” he wrote. @thinkingvirtual also noted how important communication is to successful automation: “Often forgotten: keep your business in the loop. Show back the value continuously to broaden the relationship.”

@AngeloLuciani kept things moving by asking, “Do you pick a tool to fit the process or a process to fit the tool?”

@JonathanFrappier enthusiastically went with the latter: “Process to fit the tool! Processes can change, tools have to live on until more budget is approved!” @kurtmilne added, “Tool/process construct doesn’t make sense with full automation. You can do things with automation you can’t do with manual tasks: For example, you don’t figure out manual horizontal scaling process in cloud – then look for tool to automate.”

#CloudOpsChat ended with one last great tip (and a nod to VMworld!) from @thinkingvirtual: “Automation skills are a huge career opportunity. Don’t avoid automation, defy convention.”

Thanks again to everybody who participated in this latest #CloudOpsChat, and stay tuned for details of our next meet up. If you have suggestions for future #CloudOpsChat topics, let us know in the comments.

For more resources on automation, check out the following CloudOps blog posts below:

In the meantime, feel free to tweet us at @VMwareCloudOps with questions or feedback, and join the conversation by using the #CloudOps and #SDDC hashtags. For more from Rich Pleasants, head over to the VMware Accelerate blog.

The Critical Element of Service Delivery in the Cloud Era: Join our Webcast 11/14

As more companies aim to build the software-defined datacenter (SDDC), the importance of service definition continues to grow. Running a successful SDDC strategy means understanding service offerings, for sure. But it’s also about standardizing those offerings to achieve agility and efficiency. So where do you start? How do you know what services your company can best provide?

Join Product Manager Jason Holmberg and Business Solutions Architect Rohan Kalra on Thursday, November 14th  at 10am PT for their BrightTalk webcast: The Critical Element of Service Delivery in the Cloud Era. The webcast will take you through the four fundamental Service Catalog building blocks:

  • Automation
  • Governance & Policies
  • Provisioning and orchestration
  • Lifecycle management

Both Jason and Rohan have years of experience building and implementing service catalogs. In addition to defining these building blocks, Jason and Rohan will dive into the requirements for each component, making it easier for you to implement a service catalog and making sure that you’re delivering the best services to your users through your catalog. Don’t miss this webcast to learn how service definition will be the key to your SDDC.

Follow @VMwareCloudOps on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.

Task Automation vs. Process Automation: Join Us For #CloudOpsChat 11/13!

Here at VMware, we’re always talking about automation: Venkat Gopalikrishan detailed his success after automating the provisioning of business-critical application stacks, and Paul Chapman introduced VMware’s IT transformation story by highlighting the importance of automation and change management.

We saw some fantastic insight on automation from many of you during our last #CloudOpsChat and wanted continue the conversation. For this month’s #CloudOpsChat, we’re specifically focusing on next steps in automation by asking the following questions: What has your company successfully automated? Do you focus on task automation, process automation, or both?

Join us on Wednesday, November 13th at 11am PT to discuss task vs. process automation with your CloudOps peers. Hosting the chat is CloudOps expert Rich Pleasants, Business Solutions Architect and Operations Lead for Accelerate Advisory Services at VMware.

During the chat, we’ll discuss:

  • What business tasks or processes has your company successfully automated?
  • When discussing automation, how do you determine whether you should automate a task vs. a process?
  • Do you have people in your company whose primary role is automation?
  • What technique do you use primarily for automation? Policy, orchestration or scripting?
  • How does the blueprint concept impact your automation workflow (scripting, orchestration)?

Here’s how to participate in #CloudOpsChat:

  • Follow the #CloudOpsChat hashtag (via,, TweetDeck, or another Twitter client) and watch the real-time stream.
  • On Wednesday, November 13th at 11am PST, @VMwareCloudOps will pose a few questions using the #CloudOpsChat hashtag to get the conversation rolling.
  • Tag your tweets with the #CloudOpsChat hashtag. @reply other participants and react to their questions, comments, thoughts via #CloudOpsChat. Engage with each other!
  • #CloudOpsChat should last about an hour.

In the meantime, RSVP to the event and feel free to tweet at us at @VMwareCloudOps with any questions you may have! For even more on automation, check out Rich Pleasants’ latest blog post for VMware Accelerate where he discusses “Intelligent Automation.”

We look forward to seeing you in the stream!

7 Communication Tips to Facilitate Culture Change When Adopting a Cloud Model

By: Pierre Moncassin

In my last blog, I talked about the importance of mindset change as organizations progress toward a cloud transformation based on VMware technology. I outlined how an organization’s culture needs to fully embrace the service orientation that’s at the heart of the cloud model if it is to be successful.

A recent discussion I had with a lead technologist from a global organization prompted some further insights into the cultural change that is key for cloud adoption. This organization already had a mature approach to cloud-based services, and was planning a longer-term cloud transformation. However, the technologist noted, his group hadn’t automatically “learned” a cloud-based model.

That struck a chord. His experience illustrates a challenge that even organizations with advanced cloud maturity can encounter: moving to a cloud approach requires a specific learning path for the organization– a path that cannot be left to chance or routine organizational learning. Organizations and individuals usually excel at absorbing incremental changes. For individuals, this is sometimes called “learning on the job.”  But on-the job learning will not work for a ‘disruptive technology’ like the cloud because the cloud model does not just require changing how the job is done: It requires a new way of thinking about the job itself. In my opinion, the learning will best happen by design, and I will share some ideas on how to engineer that learning.

But first, let’s explore in more depth the reasons that can make this learning path challenging:

A Major Change

As a starting point, it’s important to acknowledge that moving to a fully-fledged cloud service delivery model really is a ‘paradigm shift.’  While the expression is over-used, ‘paradigm shift’ does convey how a cloud transformation is not just a quantitative change (realizing an uplift in utilization, for example, or a reduction of capital expenditure) but a qualitative change, as well. It’s introducing both new metrics and new concepts.

How do organizations respond to change? Well, most are highly capable of adapting to incremental, quantitative changes. Ask for a change in their internal processes, metrics, or in response to their external environment and, typically, they will take it in their stride. As the global technologist described it to me: When it comes to incremental changes, most organizations are in ‘Self Learning Mode.’ They just absorb the changes as routine.

But with a cloud transformation, the ‘Self Learning’ mode rarely works. The learning curve is both bigger than with incremental change, and requires the learning of fundamentally new attitudes and approaches. Because of this, such cultural learning is much at more risk of resistance to change – which is one reason, of course, why learning a cloud model is so hard to do.

There is nothing wrong per se when organizations experience resistance to change. They are made up of individuals, after all, and aversion to change is a common, instinctive reaction when we’re faced with new knowledge and new demands.

But how do you to get around, or head off, that resistance and enable cultural change when you need it? It begins with planning effective communication about why things need to be done differently, what needs to change, and how that change can be effected. The question, then, is how can we structure this communication for optimum impact?

The Key to Organizational Learning

A great way to encourage and accelerate organizational learning is to adapt some key principles of adult learning (a body of practice that focuses on how adults learn).

  • For example, compared to children, adults tend to be more goal-oriented. So it helps to paint early on a compelling vision for the future-state organization.
  • At the same time, adults can be more questioning – so a broad-brush vision should be supported by concrete, applicable information, so that everyone can see how the new org culture will help their daily activities in the short and medium term.
  • On the same basis, adults also bring substantial prior knowledge and experience, and will try to relate new knowledge to their past experience. So when introducing new concepts like cloud service models, it is worth showing how these new models can relate to more traditional models (such as IT Service Management).

We also want to leverage positive group dynamics. People naturally listen to, and model, leaders in their teams and the broader organization. So look to enlist visible, charismatic leaders within the organization: these may be roles with formal authority (e.g. IT executives) but should also include key influencers with a more informal following.

Let’s turn these insights into 7 actionable recommendations for optimizing communication to facilitate culture change.

In summary:

  1. Enlist visible leaders to paint the vision and be responsible or be the conduit for regular communication.
  2. Communicate often and in small chunks. This makes the information easier to absorb and remember. Plan to communicate over an extended period of time throughout the transformation and beyond.
  3. Include enough detail to make the communication personal and practical whenever possible. Explain the practical relevance to day-to-day tasks.
  4. Vary the communication approaches. Some people prefer visual communications (for example engineers often do as they are used to schemas and diagrams). But others respond better to verbal communication.
  5. Avoid too specialized vocabulary (a.k.a. “jargon”) or too technical content, especially at the early stages when the cloud concepts are being introduced.
  6. Explain the continuity, or at least relationship with “traditional” approaches, concepts and practices that individual are familiar with.
  7. Welcome resistance to change (to a point). This is a normal part of the learning process. Allow concerns and questions to be aired as early as possible.  It is worth remembering that sometimes your most hard-core initial skeptics, once they are convinced, turn into your most dedicated and vocal supporters.

Follow @VMwareCloudOps and @Moncassin on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.

A VMware Perspective on IT as a Service, Part 3: Agility, How to Measure it, and Keep Improving it Over Time

By: Paul Chapman, VMware Vice President Global Infrastructure and Cloud Operations

In this series of posts, I’m offering a VMware Corporate IT perspective on the journey to IT as a Service, looking at how we made the change ourselves, sharing some of the many benefits that ITaaS is bringing us, and offering some insights on how – if you’re considering taking the plunge – you might successfully make the transition yourself.

I started out by offering my take on the journey to IT as a Service. Then I shared the story of how our applications operations group transformed its operations by shifting to the Software Defined Data Center. This time, I’ll explain how we think of agility at VMware.

If I had to pick just two reasons why we decided to transition to IT as a Service (ITaaS) at VMware, it was that:

a) It would let IT deliver at the speed of business,


b) It would make IT a game changer – by providing business transformation through IT transformation, we’d be helping the business scale and grow.

But, if our vision embraced getting IT to run at the speed of business, delighting our users every day, and turning IT into an innovation center and not a roadblock, agility, efficiency, and an ITaaS mindset were the engines that would get us there.

Importantly, though, we put agility first. Many IT businesses think of IT as a cost center and put efficiency and cost first. While the “cut ‘till it hurts” approach can drive down costs and make efficiency metrics looks better, it can also leave IT hamstrung and unable to innovate and support changing business needs. Our big insight was that investments that significantly improved agility ALSO resulted in higher efficiency and service quality gains. By leading with agility, we could achieve both.

How We Think of Agility

When we think about agility, it’s on three levels. We’re aiming for:

  1. Zero demand. Every time anyone has to make an IT request, that’s a potential point of friction. They’ve had to stop what they’re doing and ask for something to be done, and then wait even longer for a response before they can move on. So our goal has been to completely remove their need to ask in the first place. You’d be surprised how much low hanging fruit there is here. There are lots of diamonds in the backend data, as long as IT studies it with the mindset of elimination vs. just resolution speed and SLA improvements.
  2. More self-service. Self-service is another key component to agility and ITaaS. The alternative is that our customers have to go to a help desk, the last place they should go, and wait for someone to get back to them through some kind of queued ticketing system. Taking IT out of the equation and giving users the ability to self-serve significantly increases speed-to-solution and user satisfaction.
  3. The ability to serve complex requests. If you have made progress on the first two, the calls and tickets you do get will by default feature more complex or lower volume requests. So what’s the best way to respond to them? Here’s how we do it: We’ll see, for example, that on average it’s taking n days to deliver or solve service request x. Then we ask (if we cannot eliminate or provide self-service) how can we slice up these requests, automate workflow and individual tasks, and reduce handoffs as much as possible? It’s not a one-time deal, it’s a process and an incrementally ongoing quest to reduce the time-to-deliver on the demand that comes to IT. Hiring or moving people to focus on this vs. volume-based hiring to solve requests is far more likely to have a transformative impact on IT and its agility in serving customer needs.

Measuring Agility

The shorter we can make the delivery time for any service, even the most complex, the better off our customers are in terms of being able to get what they need to do their job. So how do we measure our responsiveness?

Our goals here are very aggressive. We’re looking to reduce the time it takes to deliver a particular service from months, days, and weeks, to hours, minutes, and seconds, and in some cases eliminate the need altogether.

Specific examples of metrics I like to use to gauge my organizing’s agility include:

  • The number of service requests that have been completely eliminated either through resolving at root the need in the first-place and or through fully automating solutions
  • The number of service requests that are self-served
  • If no automation or elimination is possible, then the cost- and speed-to-deliver of the solution from time of user creation
  • The services offered through the ‘outside-in’ IT Service Catalogue

I am not a fan of time-based SLAs because I think it promotes the wrong behavior – customer satisfaction, elimination, automation, and self service are far more meaningful than time-based SLAs.

More fundamentally, IT should always be trying for an outside-in view and not an inside-out view. All too often, Service Catalogues are based on IT defining its services and how a customer can “order them” vs. offering the services the “customer needs and wants.”

Predicting Demand in a Ticketless World

Done right, agility means offering services on-demand before the customer even makes a request. It’s a nice idea, but how does IT predict what issues might come up so that it can preemptively have solutions ready before they’re needed?

At VMware, one approach has been to significantly increase focus and investment on having the right forensics in place. This lets us go from reactive, to proactive, to predictive, and to be very aware of everything that is going on in the environment.

In addition, we monitor internal social communications for sentiment and issues that might be brewing. Today we see a lot of activity on our internal Socialcast collaboration site, which is helping us get ahead of issues as well as have a more intimate relationship with our users.

Then we add transparency. Internally, we have a portal that shares the quality of service delivery at any given point. So if we’re seeing degradation in a network connection, for example, or a quality of service issue with a particular application, anybody in the company can see what we know, what we’re looking at, and what we’re communicating about the issue.

Key Takeaways:

  • Agility, efficiency, and organizational mindset are at the heart of solving the movement to ITaaS.
  • Agility inevitably drives increased efficiency, but efficiency alone won’t lead to meaningful agility gains.
  • At VMware, we think of agility in terms of customer-centric, zero demand, automation, and self-service.
  • We measure agility in terms of speed and cost.
  • Forensics and transparency are key to predicting demand, and identifying and communicating issues.
  • Without true agility, it is very painful and costly for organizations to scale and to remain nimble at the same time.
  • The power of automation plus the power of a self-service mindset lets IT organizations help scale the business in a customer centric, cost-effective way.

You can find parts 1 and 2 of this series here and here. Next time, I’ll explain what it took to stand up and run our own internal private cloud with ~50k VM’s.

Follow @VMwareCloudOps and @PaulChapmanVM on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.

A VMware Perspective on IT as a Service, Part 2: An In-house Example of IT Transformation

By: Paul Chapman, VMware Vice President Global Infrastructure and Cloud Operations

In this series of posts, I’m offering a VMware Corporate IT perspective on the journey to IT as a Service (ITaaS), looking at how we adopted the movement ourselves, sharing some of the many benefits that ITaaS is bringing us, and offering some insights on how – if you’re considering taking the plunge – you might successfully make the transition yourself.

Last time, I outlined the context for the movement to IT as a Service – one that suggests we’re now at a point where IT can no longer hide behind the complexity of IT environments, and where IT organizations need to deliver on new consumer expectations of service delivery if they are to have the agility and efficiency to deliver at the speed of business.

Today, I’m going to share the story of one of the functional IT groups at VMware – our Applications Operations group – that has transformed by shifting to a focus on agility and automation, with game changing results. If you’re curious to learn more, check out the full case study, or a short summary video here.

A Problem with Process

Here’s what happened: The Cloud Operations group within VMware corporate IT oversees the support of a portfolio of ~200+ business applications. The application operations team (AppOps) provisions and manages very complex SDLC development and test environments for a team of ~600+ global developers and quality assurance engineers who work on the VMware program portfolio.

By the middle of 2012, the AppOps team realized that it faced a serious issue with provisioning these environments.

As things stood, their processes were:

  • Slow – Manually provisioning a dev/test SDLC instance for a full enterprise applications ecosystem was taking in the range of 4-6 weeks per instance,
  • Disruptive – Hundreds of developers had to wait for a reliable new instance for extended periods of time, multiple times during the lifecycle,
  • Risk – Cascading delays created risk, keeping other portfolio projects from being able to start and/or complete on time, potentially costing millions of dollars in delays,
  • Inconsistent – Quality and lead times were unpredictable, varying with schedule complexity, different outcomes from manually repeated processes, and the capacity and availability of team members distributed around the globe.

The knock-on impact of a delay was very costly. Every time a new environment experienced delays, developers were idle and millions of dollars were at stake. This made portfolio planning inordinately difficult. We could have shrunk the portfolio and slowed the delivery of business critical programs in response, but that was unacceptable given our overall corporate growth objectives.

Then, not surprisingly, IT was therefore under considerable pressure to increase its agility, speed, and throughput.

Not the Easy Fix

Clearly, AppOps needed to reduce provisioning times and increase schedule predictability and service quality.

One way to do that would have been to try and improve the efficiency of the large “human middleware” they already had in place, applying lean methodologies and trying to be as “efficient” as possible when executing standard repeatable tasks.

However, a thorough process review made it clear that more than a continuous efficiency program was required. The primary issue was that they were scheduling and managing a large number of people who were performing, in the most part, skilled but repeatable tasks. Even with an improved provisioning process, the human-middleware problem would never fully go away, as speed and predictably could never reach the desired goals.

Instead, the AppOps group chose to completely replace and automate its provisioning process using a VMware on-premises private cloud, based on the software-defined data center. This would completely automate SDLC instance provisioning, using blueprints, policies, and automation and management capabilities using the VMware vCloud® Suite and other adjacent tools.

If they were to succeed, two factors would be critical:

  • Ambitious, long term objectives. To be successful, any solution needed to be game changing – instead of making incremental improvements to the existing process, AppOps was looking to turn a process that traditionally took 4-6 weeks to into one taking just a matter of hours. Solving this problem required a radically different approach that was built from the ground up.
  • An available private cloud. VMware had already deployed, at scale, its private cloud (called ‘Project OneCloud’), delivering infrastructure-as-a-service (IaaS) capabilities for internal use. With vCloud Suite’s automation and management capabilities, the private cloud could host all non-production SDLC instances – eliminating the need for lengthy hardware provisioning cycles.

By late 2012, the AppOps team was ready to start building the new, automated and streamlined provisioning platform, setting itself the goal of deploying all Dev/Test SDLC instances within 24 hours of a request.

Doing this meant driving transformation in three areas:

  • Architecture – Shifting from a traditional virtualized data center environment to a SDDC private cloud and deploying cloud management with automation capabilities to provision complex SDLC environments. Each instance contains over 30 applications, including the company’s full ERP, custom applications, portals, middleware, IDM, BI, webservers, app servers, integrations, databases, and more.
  • Operations – Converting manual, time consuming processes to an end-to-end, automated scripted process with blueprint-based provisioning. Key employee transitions would include investments in change-management and supporting employees through training and education, moving them to more value-added and meaningful roles in the new cloud operating model.
  • Financial – Moving from a project-capex based infrastructure funding model to a service-opex consumption and chargeback model. Instead of incurring costs for building and maintaining infrastructure to support the virtual machines, IT could pass the cost of workloads to individual project requestors. In turn, because of the ability to provision quickly and provide transparent opex service costs, there has been a higher increase in de-provisioning instances which has in turn increased infrastructure utilization and reduced spend on net-new infrastructure.

The Payoff and Business Benefit

Phase one of the project – deploying basic automated provisioning and management capabilities – has now been completed. 2,800 virtual machines that support dev/test instances have been transitioned to the new OneCloud environment, resulting in game-changing benefits:

  • Reduced provisioning time from 4-6 weeks to 36 hours: on track to achieve goal of <24 hours,
  • Increased productivity of 600 developers by as much as 20 percent,
  • Improved service quality so that AppOpps can now consistently say “Yes” to all project requests in the time required,
  • Saved the business $6M per year in infrastructure and operating costs,
  • Moved people to higher-order, more meaningful IT roles, e.g. blueprinting and automation design.

Phase two will focus on further enhancing automation and management capabilities and transitioning more pre-production environments to the private cloud.

Lessons Learned

  • Agility investments are self-sustaining. Investing in increased agility yields significant additional benefits, such as substantially reduced operating and infrastructure costs, and increasing service quality.
  • vCloud Suite is a full solution. The AppOps team implemented vCloud Suite to automate provisioning and management of SDLC instances. Out-of-box functionality let them automate and manage a wide range of core tasks. The availability of SDKs and APIs let them deliver additional automation and management functionality through adjacent tools.
  • On-demand capabilities change IT service consumption. SDLC instances are no longer viewed with the same risk outlook as before. Where developers and applications owners formerly felt the need to keep an instance open for multiple and/or on-going projects, AppOps can now release those instances back into the provisioning pool in a “disposable infrastructure” service consumption model.
  • APIs replace ticketing and late-night meetings. A service catalog and API calls help IT clarify and simplify communication about the services AppOps delivers and what its customers can expect in return. Efficiency has replaced the time-consuming, difficult, and highly-variable task of scheduling and coordinating work between multiple, globally distributed teams.

Key Takeaway:

The VMware corporate IT organization decided to invest in improving agility, and, as a byproduct, not only increased service speed and quality, but also dramatically lowered IT infrastructure and operating costs.

Next time, I’ll look at agility: how we measure it and how we keep continuously improving. In Part 4, I’ll explain what it took to stand up and run our own internal private cloud that so far include  ~50k VMs.

For more information in the meantime, please see:

Follow @VMwareCloudOps & @PaulChapmanVM on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.

Implementing a Cloud Infrastructure Is About Changing Mindsets: Three Ways Cloud Operations Can Help

By: Pierre Moncassin

A few weeks ago, I had the privilege of attending the first in a series of cloud operations customer roundtables in Frankfurt, Germany. The workshop was expertly run by my colleague Kevin Lees, principal consultant at VMware and author of “Organizing for the Cloud” as well as numerous VMware CloudOps blog posts.

Customer participation in the round table exceeded our expectations – and was highly revealing. It quickly became obvious that process and organization challenges ranked at the top of everyone’s priorities. They needed no convincing that a successful cloud deployment needs operations transformation in addition to leading-edge tools.

Even so, I was amazed how rapidly the conversation turned from technical strategy to organizational culture and, most importantly, changing mindsets.

I remember one customer team in particular outlining for us the challenge they face in operating their globally-distributed virtual infrastructure. They were acutely aware of the need to transform mindsets to truly leverage their VMware technology – and of how difficult that was proving to be.

For them, changing mindsets meant looking beyond traditional models, such as the monolithic CMDB (an idea deeply entrenched in physical IT). It also meant handling the cultural differences that come with teams based in multiple locations around the world: and, more than ever, the need to align teams with different functional objectives to common goals and gain commitments across boundaries.

To state the obvious, changing organizational mindsets is a vast topic, and many books have written about it (with many more to come, no doubt). But here I want to explore one specific question: How can cloud operations help IT leaders, like our customer above, in their journeys to mindset change?

For them, I see three main areas where cloud operations can bring quick wins:

1) Create Opportunities to Think Beyond ‘Classic’ IT Service Management

Part of the journey to cloud operations is to look beyond traditional frames of reference. For some of our customer teams, the CMDB remains an all-powerful idea because it is so entrenched in the traditional ITSM world. In the world of cloud infrastructure, the link between configuration items and physical locations becomes far less rigid.

It is more important to create a frame of reference around the service definition and everything needed to deliver the service. But adopting a service view does require change, and that’s not something that we always embrace.

So how do you encourage teams to “cross the chasm?” One simple step would be to encourage individuals to get progressively more familiar with VMware’s Cloud Operations framework (by reading ‘Organizing for the Cloud,’ for example).

After that, they could take on a concrete example via a walk-through of some key tools. For example, a VMware vCenter Operations Manager demo can illustrate how a cloud infrastructure can be managed in a dynamic way. It would show how dashboards automatically aggregate multiple alerts and status updates. Team members would see how built-in analytics can automatically identify abnormal patterns (signaling possible faults) in virtual components wherever they are physically located. A demo of vCloud Automation Center’s use of blueprints to automate provisioning of full application stacks would show how new tools that leverage abstraction can help break through process-bound procedures that were developed for more physical environments.

All of this would build familiarity with, and likely excitement at, the possibilities inherent in cloud-based systems.

2) Break Down Silos with the Organizational Model

A key principle of VMware’s cloud operations approach is to break down silos by setting up a Center of Excellence dedicated to managing cloud operations. You can read more about how to do that in this post by Kevin Lees.

The main point, though, is that instead of breaking processes up by technology domain (e.g. windows/unix etc.) or by geography, Cloud Operations emphasizes a consistency of purpose and focus on the service delivered that is almost impossible to achieve in a siloed organizational structure.

Simply by creating a Cloud Infrastructure Operation Center of Excellence, you are creating a tool with which you can build the unity that you need.

3) Boost Team Motivation

Lastly, although a well-run cloud infrastructure should in itself add considerable value to any set of corporate results, don’t forget the influence held by individual team members facing a change in their work practice.

In particular, consider their likely answer to the question “What’s in it for me?”

Factors that might positively motivate team members include:

  • Acquiring new skills in leading-edge technologies and practices (including VMware certifications, potentially)
  • Contributing to a transformation of the IT industry
  • Being part of a well-defined, well-respected team e.g. a Center of Excellence.

So, remember to make that case where you can.

Here, then, are three key ways in which you can leverage cloud operations to help change mindsets:

  1. Understand that moving to cloud is a journey. Every person has their own pace. Build gradual familiarity both with new tools and concepts. Check out more of our CloudOps blog posts and resources!
  2. Build a bridge across cultural differences with the Center of Excellence model recommended by VMware CloudOps.
  3. Explain the benefits to the individual of making the jump to cloud e.g. being part of a new team, gaining new skills – and a chance to make history!

Follow @VMwareCloudOps on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.

To Automate or Not to Automate? – Highlights from #CloudOpsChat

Last week, we held another successful #CloudOpsChat, this time asking: “To Automate or Not to Automate?” Thank you to everyone who participated in the lively conversation, and especially to our two co-hosts, Cloud Operations Architects Andy Troup (@HarrowAndy) and David Crane (@DaveJCrane)!

To start things off, we asked, “How do you define automation?”

Our co-hosts jumped in first, with @HarrowAndy stating, “automation = stop doing repeatable tasks,” and @DaveJCrane remarking on how he asked the same question during a group discussion at VMworld and received 50 different answers from 50 different people in the room! In addition, the notion that automation implies the removal of manual work was a prominent theme, with @Seemaj, @AngeloLuciani, @tcrawford and @KongYang agreeing that automation means less, or no, human intervention (see Pierre Moncassin’s take on that here).

Next, the conversation moved on to the importance of defining automation within the context of your business.

@DaveJCrane began by adding a layer to the definition, suggesting that it is “important to consider the definition of automation in context of the business environment, not just process focus.” @tcrawford agreed with David, specifying a difference between the what/why of automation, as well as the how/when. @HarrowAndy built upon @tcrawford’s response, adding that there must always be a benefit to what you’re automating, and that there is “no point automating something you only do infrequently.”

@Seemaj then brought up the cost of automation, agreeing with @tcrawford that: “There is a cost to automation, and the business drives those decisions.”

@AngeloLuciani stated that “automation drives business value,” and @tcrawford stirred the pot, replying “sometimes it can, not always.” @HarrowAndy then brought up the importance of weighing automation’s benefits with its costs, with @KongYang, @AngeloLuciani and @Seemaj adding that two of the biggest benefits to automation are limiting human mistakes and delivering services faster. @Gnowell1 emphasized automation’s goal of promoting reliable service delivery, saying “time consuming, complex tasks should also be considered for automation.”

After that, @KalraRohan asked, “What’s driving everyone to move towards automation?”

@VmwDavidH immediately offered VMware’s use case for automation: “For us, we have cut our dev environment provisioning time down from weeks to hours.” @Seemaj noted business agility as her main reason, with @AngeloLuciani saying that automation is a “building block” towards the software-defined datacenter (SDDC). @DaveJCrane agreed, adding that “[automation] is always good to implement as part of a larger ops transformation.”

@KalraRohan then asked, “What are the operational impacts of automation? What are best practices?”

@HarrowAndy, @VmwDavidH and @AngeloLuciani all agreed that a set of orchestration tools was essential in driving the success of automation. @GNowell1 suggested a key benefit that automation provides to a business: “SDDC automation promotes Ops standards. Administrators spend more time on higher level responsibilities.” And @DaveJCrane elaborated further on automation’s ability to shift ops’ focus: “Automating allows you to put more emphasis on the workflow/approval process.”

To close out this #CloudOpsChat, @HarrowAndy asked: “So what have you all automated? Is it just provisioning activities, or are there other things?”

@AngeloLuciani and @Gnowell1 had both started with provisioning and said they were looking for the next step in automation. @CloudOpsVoice stated that provisioning was a great start and great use-case for the ‘run’ side of automation, with @tcrawford adding “iterative automation is all about value.” He continued by saying that knowing what to automate next comes with “experience, and asking questions.” @Seemaj agreed, and emphasized that automation touches all aspects of a company: “Automation is not just about provisioning/tools/scripts…and it does not always have measurable outcomes. Sometimes benefits are soft benefits, e.g. improved user experience.”

Our #CloudOpsChat wrapped up with a positive outlook on the future of automation, with @AngeloLuciani tweeting “automation will be a major skill for next gen IT staff.” As automation progresses, companies will experience “less firefighting in operations and more time spent on working with the business,” suggested co-host @HarrowAndy.

Thanks again to everybody who participated in this latest #CloudOpsChat, and stay tuned for details of our next #CloudOpsChat!

In the meantime, feel free to tweet us at @VMwareCloudOps with questions or feedback, and join the conversation by using the #CloudOps and #SDDC hashtags.