Home > Blogs > VMware Operations Transformation Services > Monthly Archives: September 2013

Monthly Archives: September 2013

VMworld Recap: Hear From the CloudOps Experts, Part 2

As we discussed in Part 1 of our VMworld 2013 recap, cloud technology is empowering organizations to re-think IT – an opportunity underlined in our September #CloudOpsChat, which explored how automation lets IT operations service clients efficiently while focusing on meeting business objectives.

In the second and concluding post in this series, we further explore those possibilities – Hear from CloudOps experts Ed Newman, Kevin Lees with Khalid Hakim, Valentin Hamburger with Bjoern Brundert, Jeff Ton, and Paul Chapman as they discuss their VMworld operations transformation sessions and share their thoughts on the future of CloudOps:

Ed Newman:

Kevin Lees and Khalid Hakim:

Valentin Hamburger and Bjoern Brundert:

Jeff Ton:

Paul Chapman:

Follow @VMwareCloudOps on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.

To Automate or Not to Automate? – Highlights from #CloudOpsChat

Last week, we held another successful #CloudOpsChat, this time asking: “To Automate or Not to Automate?” Thank you to everyone who participated in the lively conversation, and especially to our two co-hosts, Cloud Operations Architects Andy Troup (@HarrowAndy) and David Crane (@DaveJCrane)!

To start things off, we asked, “How do you define automation?”

Our co-hosts jumped in first, with @HarrowAndy stating, “automation = stop doing repeatable tasks,” and @DaveJCrane remarking on how he asked the same question during a group discussion at VMworld and received 50 different answers from 50 different people in the room! In addition, the notion that automation implies the removal of manual work was a prominent theme, with @Seemaj, @AngeloLuciani, @tcrawford and @KongYang agreeing that automation means less, or no, human intervention (see Pierre Moncassin’s take on that here).

Next, the conversation moved on to the importance of defining automation within the context of your business.

@DaveJCrane began by adding a layer to the definition, suggesting that it is “important to consider the definition of automation in context of the business environment, not just process focus.” @tcrawford agreed with David, specifying a difference between the what/why of automation, as well as the how/when. @HarrowAndy built upon @tcrawford’s response, adding that there must always be a benefit to what you’re automating, and that there is “no point automating something you only do infrequently.”

@Seemaj then brought up the cost of automation, agreeing with @tcrawford that: “There is a cost to automation, and the business drives those decisions.”

@AngeloLuciani stated that “automation drives business value,” and @tcrawford stirred the pot, replying “sometimes it can, not always.” @HarrowAndy then brought up the importance of weighing automation’s benefits with its costs, with @KongYang, @AngeloLuciani and @Seemaj adding that two of the biggest benefits to automation are limiting human mistakes and delivering services faster. @Gnowell1 emphasized automation’s goal of promoting reliable service delivery, saying “time consuming, complex tasks should also be considered for automation.”

After that, @KalraRohan asked, “What’s driving everyone to move towards automation?”

@VmwDavidH immediately offered VMware’s use case for automation: “For us, we have cut our dev environment provisioning time down from weeks to hours.” @Seemaj noted business agility as her main reason, with @AngeloLuciani saying that automation is a “building block” towards the software-defined datacenter (SDDC). @DaveJCrane agreed, adding that “[automation] is always good to implement as part of a larger ops transformation.”

@KalraRohan then asked, “What are the operational impacts of automation? What are best practices?”

@HarrowAndy, @VmwDavidH and @AngeloLuciani all agreed that a set of orchestration tools was essential in driving the success of automation. @GNowell1 suggested a key benefit that automation provides to a business: “SDDC automation promotes Ops standards. Administrators spend more time on higher level responsibilities.” And @DaveJCrane elaborated further on automation’s ability to shift ops’ focus: “Automating allows you to put more emphasis on the workflow/approval process.”

To close out this #CloudOpsChat, @HarrowAndy asked: “So what have you all automated? Is it just provisioning activities, or are there other things?”

@AngeloLuciani and @Gnowell1 had both started with provisioning and said they were looking for the next step in automation. @CloudOpsVoice stated that provisioning was a great start and great use-case for the ‘run’ side of automation, with @tcrawford adding “iterative automation is all about value.” He continued by saying that knowing what to automate next comes with “experience, and asking questions.” @Seemaj agreed, and emphasized that automation touches all aspects of a company: “Automation is not just about provisioning/tools/scripts…and it does not always have measurable outcomes. Sometimes benefits are soft benefits, e.g. improved user experience.”

Our #CloudOpsChat wrapped up with a positive outlook on the future of automation, with @AngeloLuciani tweeting “automation will be a major skill for next gen IT staff.” As automation progresses, companies will experience “less firefighting in operations and more time spent on working with the business,” suggested co-host @HarrowAndy.

Thanks again to everybody who participated in this latest #CloudOpsChat, and stay tuned for details of our next #CloudOpsChat!

In the meantime, feel free to tweet us at @VMwareCloudOps with questions or feedback, and join the conversation by using the #CloudOps and #SDDC hashtags.

VMworld Recap: Hear From the CloudOps Experts, Part 1

As cloud technology advances, IT organizations are working hard to keep up: We covered the changing role of the IT Admin during our May #CloudOpsChat, and Kevin Lees previously discussed the culture shift required for IT Transformation.

At VMworld, the Operations Transformation track offered some great opportunities to hear more about how IT can successfully make the change – transitioning from practices rooted in the client-server era, to those optimized for virtualized resource pools, more automation, and new application architectures common in the cloud era.

We caught up with many of our CloudOps experts on-site at VMworld to ask them about their sessions and what they think the future holds for IT operations in the cloud era.

For Part 1 of this blog series, we’ll hear from experts Venkat Gopalakrishnan, Kurt Milne, Ed Hoppitt with Phil Richards, David Crane, and Rich Bourdeau with Rich Pleasants:

Venkat Gopalakrishnan:

Kurt Milne:

Ed Hoppitt and Phil Richards:

David Crane:

Rich Bourdeau and Rich Pleasants:


Stay tuned for Part 2 and more insights on cloud operations from the CloudOps experts at VMworld.

Follow @VMwareCloudOps on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.

To Automate, or Not to Automate? Join Us For #CloudOpsChat 9/18!

We talk about automation regularly here on the CloudOps blog – Kurt Milne looked into the economics of task and service automation, Andy Troup broke down the automation Scripting, Orchestration and Technology Love Triangle, and, more recently, Pierre Moncassin discussed how automation doesn’t always mean removing the human element from your workflows. Automation, of course,  also continues to be a hot topic for our customers.

For our next #CloudOpsChat on Wednesday, September 18th at 11am PST, we’d like to invite our CloudOps audience to keep the conversation going, discussing: “To Automate, or Not to Automate? – considerations and best practices for IT admins looking to take advantage of the benefits of automation in a smart but effective way.”

Co-hosting the chat will be our very own CloudOps bloggers Andy Troup and David Crane, both Cloud Operations Architects at VMware.

During the chat, we’ll discuss:

  • What approach have you taken to identify suitable processes to automate?
  • What process areas have you started automating?
  • What technique is primarily used for your automation? Policy, orchestration or scripting?
  • What makes a process a good candidate for automation?
  • What challenges has your organization faced when approaching automation?
  • What business processes have you successfully automated in your organization?

Here’s how to participate in #CloudOpsChat:

  • Follow the #CloudOpsChat hashtag (via Twubs.comTchat.io, TweetDeck, or another Twitter client) and watch the real-time stream.
  • On Wednesday, September 18th at 11am PST@VMwareCloudOps will pose a few questions using the #CloudOpsChat hashtag to get the conversation rolling.
  • Tag your tweets with the #CloudOpsChat hashtag. @reply other participants and react to their questions, comments, thoughts via #CloudOpsChat. Engage with each other!
  • #CloudOpsChat should last about an hour.

In the meantime, RSVP to our #CloudOpsChat and feel free to tweet at us at @VMwareCloudOps with any questions you may have. We look forward to seeing you in the stream!

The Paradox of Re-startable Workflows: A More Efficient, Automated Process Does Not Always Mean Removing the Human Element

By: Pierre Moncassin

A chance conversation with a retired airline captain first brought home to me the paradox of automation. It goes something like this: Never assume that complete automation means removing the human element.

The veteran pilot was adamant that a commercial aircraft could be landed safely with the autopilot – but, he explained, contrary to what some people believe, that does not mean the human pilot can just push a button and sleep through the landing. Instead, it means that the autopilot handles the predictable, routine elements of the landing while the pilot plays the vital role of supervising the maneuver and reacting to any unforeseen situations.

We’ve seen a similar paradox at play in workflow automation situations faced by some of our enterprise customers. Here’s a typical scenario: A customer has deployed an automated provisioning workflow using VCO along with vCD and/or VCO. They have relied on VCO scripting to automate the provisioning steps so that end users can provision infrastructure just by “pushing a button.” As with the aircraft autopilot (though hopefully less life-threatening), the automated workflows work well until an unexpected situation occurs – there’s an error in the infrastructure, a component with a key dependency changes, or the key dependency itself changes.

This often means a failed workflow, and sometimes an error message that the end user struggles to interpret. After a couple of “failed workflow” experiences, the end user is quickly discouraged, user satisfaction plummets and…  need I say more?

Well, this is not what automation is supposed to be all about – We want maximum user satisfaction. The missing element here is an error recovery mechanism, one that very often involves human intervention. So how does that work?

One approach, in terms of VCO workflows, is to build in error handling into the workflows. It is not possible to predict all error situations, of course, but it is possible to detect error situations and issue an error message to an administrator; this at least enables the interception of the condition, which maybe simple to fix.

A second and more advanced part of the solution is to build modular scripts – that way you are fixing the problems once only and, of course, making your scripts more robust and repeatable over time.

The third part of the solution is to build re-startable workflows. This essentially means giving an administrator or process owner the ability to undo steps at any point in the flow. In the case of a straight-forward VM provisioning workflow, the solution might be as simple as removing the VM and automatically restarting the workflow from the beginning.

Or, it could be more complex – perhaps your resources have run out (maybe additional storage needs provisioning), or an issue arises with network settings. In these cases, you may need to troubleshoot before the workflow can re-start. But the point remains the same: A re-startable workflow gives your end users the best chance to complete their original request, rather than stare at an error message.

With error detection, you can roll back to the initial state and flag the error. Once the error is resolved, the administrator can either “resume” or restart from that known point with a known configuration, or at least no worse knowledge than you had before.

Crucially, all the error and exception handling is hidden from the user. That allows the request to complete (or to at least have a better chance of completing) – making for a much better experience for the end user.

It is up to the script designers to decide how much of the error they want to share with the end users – a decision that should be made with the administrator responsible for overseeing the process and responding to exceptions. The goal, though, is to keep end users happy and blissfully unaware of error situations as long as their request is satisfied!

To reiterate my original point: Despite the apparent automaticity of these resolutions, they will have been the result of human intervention along the way.

Finally, as a further step towards optimum organization, I recommend looking at the broader picture of governance around the cloud-related processes. How does the resolution team interact with the Service Desk, for example? Are there policies about when to re-provision instead of repair? Is there a specific organization to manage the cloud-based services? See our whitepaper “Organizing for the Cloud” for an introduction to optimizing the whole IT organization to leverage a cloud infrastructure.  But I digress…

In summary – if you are worried that workflow failures may impact your end users:

  • Build resilience in your VCO workflows and related scripts
  • Build in mechanisms to facilitate human resolution for unpredictable situations
  • Create re-startable VCO workflows
  • Identify a process owner who has responsibility and accountability for managing exceptions and errors

Thank you to my colleague David Burgess, who helped me formulate several of the key ideas in this post.

For more, browse our blog for some of our previous posts on automation, and join our upcoming automation #CloudOpsChat on 9/18 with Andy Troup and David Crane!

Follow @VMwareCloudOps on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.

The CloudOps VMUG Special Interest Group

We’re back from VMworld in San Francisco! We had a great week on-site, filled to the brim with live-tweeting key CloudOps sessions, meeting up with CloudOps executives, and picking winners for our #OpsTrivia and #OpsCaption Twitter contests.

One of the conference highlights was the VMware User Group (VMUG) Special Interest Group Luncheon, held at the W Hotel. Organized by Toronto VMUG Leader Angelo Luciani, the event provided the perfect opportunity for members of the four VMUG special interest groups (CloudOps, Healthcare, Public Sector, and Higher Education) to network over lunch.

The CloudOps group was the very first VMUG special interest group established and has already hosted several events, including a Google+ Hangout. In addition, it holds a monthly discussion of a trending CloudOps topic on the last Thursday of every month.

To learn about the benefits of joining a VMUG special interest group, we caught up with Angelo Luciani during the luncheon:

Don’t miss out on any future CloudOps VMUG events:

  • Register for our VMUG special interest group
  • Subscribe to our news announcements
  • Join the conversation with your CloudOps peers on the forums and on Twitter with the #CloudOps and #cloudopssig hashtags

For more photos from the luncheon and VMworld, head to our CloudOps Flickr page.

Follow @VMwareCloudOps on Twitter for future updates, and join the conversation by using the #CloudOps, #SDDC and #cloudopssig hashtags on Twitter.