Tag Archives: cloud

7 Key Steps to Migrate Your Provisioning Processes to the Cloud

By David Crane

dcrane-cropIn an earlier blog, my colleague Andy Troup shared an experience where his customer wanted to embark on a process automation project, which could have had disastrous (and consequently frustrating and costly) results, as the process itself was inherently unsuitable for automation.

Automating processes is one of the first projects that organizations embark on once a cloud infrastructure is in place, but why? The answer lies in legacy IT organizations structures that have typically operated in silos.

Many of the IT organizations that I work with have a number of groups such as service development, sales engineering, infrastructure engineering, and IT security that face similar challenges that can include (among many others):

  • Applications provisioned across multiple environments such as development, QA, UAT, sales demonstrations, and production
  • Managing deployments of application workloads in a safe and consistent manner
  • Balancing the speed and agility of deploying the services required to deliver and improve business results while meeting compliance, security, and resource constraints

With the agility that cloud computing offers, organizations look to the benefits that automating the provisioning processes may bring to overcome the above challenges such as:

  • Reduced cycle time and operating costs
  • Improved security, compliance, and risk management
  • Reduced capital and operating expenditures
  • Reduced need for management/human intervention
  • Improved deployment/provisioning time
  • Consistent quality in delivery of services

The IT organizations I work with are often sold these benefits without consideration of the operational transformation required to achieve them. Consequently, when the IT team kicks off a project to automate business processes, especially service provisioning, their focus is on the potential benefits that may be achieved. The result of this focus is that automation becomes a panacea, and not something that should underpin the IT organization’s overall operational transformation project.

As IT leaders, when considering migrating your provisioning processes to your cloud environment, you need to realize that automation alone will not necessarily provide the cure to problems that exist within a process.

You should not consider the benefits of automation in isolation. For example, too much focus on cost reduction can frequently lead to compromise in other areas leading to objections and resistance from business stakeholders. You should consider other benefits that have non-tangible (or direct) metrics, such as improved staff satisfaction. Automation frees your technical staff from repetitive (and uninteresting) activities, which results in both improved staff retention and indirect cost benefit.

As you select processes to migrate from a physical (or virtual) environment to the cloud, the subsequent automation of those processes should not be an arbitrary decision. Frequently my clients choose processes as candidates for automation for reasons based on personal preferences, internal political pressures, or because some process owners shout louder than others!

Instead the desired business benefits the organization wishes to achieve should be considered in conjunction with the characteristics, attributes, and measurable metrics of a process characteristics and a formal assessment made of its suitability for automation.

Your automation project should also be implemented in conjunction with an organization structure assessment, which may require transformation and the introduction of new roles and responsibilities required to support the delivery of automated and self-optimizing processes.

Important Steps to Your Successful Process Assessment
Based on my experience assisting customers in this exercise, I recommend taking these steps before you embark on a process assessment:

  1. Understand automation and what it actually means and requires. Many organizations embark on automation without actually understanding what this means and the context of automated processes and their capabilities. Subsequent delivery then either leads to disappointment as automation does not meet expectations, or the process is not truly automated but instead has some automated features that do not deliver all the expected benefits.
  2. Identify and document the expected business benefits to be achieved through introduction of process automation. This is an important task. Without understanding the benefits automation is expected to achieve, you cannot identify which processes are the correct choices to help you do just that.
  3. Understand cloud infrastructure system management capabilities required to support process automation (e.g., ability to detect environmental changes, process throughput monitoring capability) and implement if required.
  4. Identify ALL processes required to support automated provisioning (e.g., instantiation, governance, approval) to create a process portfolio.
  5. Identify the common process automation characteristics that exist across the process portfolio (e.g., self configuration, self healing, self audit and metric analysis). Note that process characteristics are unique, high-level identifiers of automation across the portfolio.
  6. Identify the common attributes that the process characteristics share. These are more granular than process characteristics and thus may be common to more than one characteristic in the same process.
  7. Identify the metrics available for each process in the portfolio, and apply a maturity assessment based on their ability to be measured and utilized. Metric maturity is an essential part of the assessment process as it determines not just the suitability of the process for automation, but also its capability to perform self optimization.

Process Assessment Weighting and Scoring
When undertaking a process assessment program, an organization needs to understand what is important and prioritize accordingly. For example, if we consider the business benefits of automation, a managed service provider would probably prioritize business benefits differently to a motor trade retail customer.

Once you’ve prioritized your processes, they can be assessed more accurately and weighted based on each identified business benefit. Prioritization and weighting is essential, and you need to carefully consider the outcomes of this exercise in order for your process assessment to reflect accurately whether processes are suitable for automation or not.

And remember, as previously mentioned avoid considering each assessment criteria in isolation. Each process characteristic and associated attribute can have a direct impact on the desired business benefit, however if its metric maturity is insufficient to support it, then the business benefit will not be fully achieved.

For example, let’s say that you have identified that a business process you wish to automate has a self-healing characteristic. One of the attributes the characteristic possesses is the ability to perform dynamic adjustment based on real-time process metrics. The characteristic and attribute would lead you to expect the realize benefits such as reduced cycle time, reduced OpEx, consistent quality of service, and improved staff retention.

However, although you’ve identified the metrics required to meet the characteristic and attribute needs, they are neither measured or acted upon. Consequently, because the metric maturity level is low, then the expected business benefit realization capability is also lowered.

Figure 1 below shows a small sample of the assessment of a process in relation to a single process characteristic, common attributes, anticipated business benefits and their weighting and the impact that a poor metric maturity has on their capability to deliver the anticipated business benefit.

Figure 1. Assessment displaying impact of low metric maturity

Contrast this to Figure 2 below, which assesses a process with exactly the same characteristics, attributes, business benefits, but has supporting management capabilities and consequently much improved metric maturity:

Figure 2. Assessment displaying impact of high metric maturity

Based on this small data sample, the process in Figure 2 is a more likely candidate for process automation. The assessment process also then identifies, and allows the IT organization to focus on, areas of remediation needed to optimize processes to enable them to be suitable automation candidates.

The result is the IT organization is able to realize not just the business benefits that have been promised by automation more effectively, but they are also able to set realistic expectations with the business, which brings benefits all of its own.

In summary, automation is not the “silver bullet” for broken or inefficient processes. IT leaders need to consider expected business benefits in conjunction with process characteristics, attributes, and metrics and in the context of what is important to the business. By assessing the suitability of a process for automation, you can save the cost of a failed project and disappointed stakeholders. Finally, you should not undertake any provisioning process project in isolation to other operations transformation projects, such as organization structure and implementation of cloud service management capabilities.

I will discuss the steps to success mentioned above in more detail in my next blog.

===

David Crane is an operations architect with the VMware Operations Transformation global practice and is based in the U.K.

 

3 Steps to Get Started with Cloud Event, Incident, and Problem Management

By Rich Benoit

Benoit-cropWe are now well entrenched in the Age of Software. Regardless of the industry, there is someone right now trying to develop software that will turn that industry on its head. Previously, companies worked with one app that had the infrastructure along with it. It was all one technology, and one vendor’s solutions. Now there are tiers all over the place, and the final solution uses multiple components and technologies, as well as virtualization. This app is a shape shifter, one that changes based on the needs of the business. When application topology is changing like this over time, it creates a major challenge for event, incident, and problem management.

Addressing that challenge involves three major steps that will affect the people, processes, and technologies involved in managing your app.

1. Visualize with unified view
The standard approach to monitoring is often component- or silo-focused. This worked well when apps were vertical where an entire application was on one server; but with a new, more horizontal app that spans multiple devices and technologies – physical, virtual, web – you need a unified view that shows all tiers and technologies of an application. That view has to aggregate a wide range of data sources in a meaningful way, and then identify new metrics and metric sources. The rule of thumb should be that each app gets its own set of dashboards: “big screen” dashboards for the operations center that shows actionable information for event and incident management; detailed interactive dashboards that allow the application support team to drill down into their app; and management level dashboards that show a summary business view of application health and KPIs.

By leveraging these dashboards, event and incident management teams can pull up in real time to diagnose any issues that arise (see example below). Visualization is key in this approach, because it allows you to coordinate the data in a way that will actually allow for identification of events, incidents, and problems.

big screen dbVMware® vCenter™ Operations Manager™ “big screen” dashboard

2. Aggregate
When you’re coordinating a number of distributed apps, establishing timelines and impact becomes a much more complicated process. Here’s where your unified view can start to help identify problems before they occur. Track any changes that occur, and then map them back to any changes that have happened. When I’m working with clients, I demonstrate the VMware® vCenter™ Operations Manager™ ability to establish dynamic thresholds. The dynamic thresholds track back what constitutes common fluctuations, and leverages those analytics to establish baselines around what constitutes “normal.” By looking at the overall data in a big picture, the software can avoid false triggering around normal events.

3. Leveraging Problem Management
Ideally, you will be catching events and incidents before they result in downtime. However, that requires constantly looking for new metrics and metrics sources to create a wider view of the app. Problem management teams should be trained to identify opportunities for new metrics and new metrics sources. From there, the development team should take those new metrics and incorporate them into the unified view. When an issue occurs, and you look for the root cause, also stop to see if any specific metrics changed directly before the problem occurred. Tracking those metrics could alert you to a possible outage before it occurs the next time. Problem management then becomes a feedback loop where you identify the root cause, look at the surrounding metric, and then update the workflows to identify precursors to problems.

This doesn’t require you to drastically change how you are managing problems. Instead, it just involves adding an extra analytics step that will help with prevention. The metrics you’re tracking through the dashboard will generally fall into three basic buckets:

  • Leading indicators for critical infrastructure
  • Leading indicators for critical application, and
  • Metrics that reflect end-user experiences

Once you have established the value of finding and visualizing those metrics, the task of problem management becomes proactive, rather than reactive, and the added level of complexity becomes far more manageable.

—————-
Richard Benoit is an Operations Architect with the VMware Operations Transformation global practice and is based in Michigan.

5 Ways Cloud Automation Drives Greater Cost and Operational Transparency

By Kevin Lees

Kevin_cropThere has always been tension between IT teams and their end customers — not the good kind of tension, but rather the contentious kind that rarely ends well.

It breaks down like this: IT never believes it has enough time, resources, or money; and the line of business (LOB) really doesn’t understand what they want. On the other side, the LOB is rarely happy with IT because response times aren’t fast enough or IT is missing the mark with its capabilities.

This tension leads to inefficient use of resources, both equipment and people. Shadow IT happens when those outside of IT take matters into their own hands and shirk IT policies and procedures. This can mean inefficiencies in the allocation of capital because finance is challenged to track exactly what it costs for IT to deliver. This becomes especially difficult in a shared resource environment, and it will only get more challenging as we move to a fully virtualized stack as defined by the software-defined data center (SDDC).

This can lead to all sorts of problems, fostering mistrust, lost profits, and lost opportunities. You get the idea.

In this post, we’ll explore key ways that cloud automation is critical to fulfilling the promise of cloud and how automation provides opportunities to practice cost and operational transparency as a way to help drive business alignment.

The Promise of Cloud Management
Cloud holds great promise and great responsibility. It provides many advantages to both IT and its stakeholders, but without effective cloud management and automation, the true value will never be realized.

This is true regardless of the type of cloud, whether private enterprise cloud, an external cloud provider, or a hybrid cloud.

As the figure below shows, there are five areas to focus on that not only provide opportunities to drive business alignment, but also provide opportunities to practice the cost and/or operational transparency needed to gain the business stakeholder’s trust:

5 Ways Automation-Lees

  1. Service quality: The business has to know it can count on the service it’s consuming.
  2. Predictability: Of course, the service has to be predictable. Outages are unacceptable.
  3. Agility: The business needs to quickly react to changing business conditions or proactively get to market before the competition, so IT needs to keep up.
  4. Smart economics: It also has to be cost effective. If it’s not, shadow IT rears its ugly head, and any degree of governance as well as economy of scale efficiencies dissipate into the cloud, outside of IT’s control.
  5. Clear communication: Business stakeholders have to truly understand what they’re getting and how much flexibility of choice is available to them.

That said, IT cannot deploy and run an effective and successful cloud in a vacuum. A truly successful cloud, one that adds real business value, requires alignment among IT, LOB, and finance. It requires a lot of interaction, listening, discussing, and agreeing. Yes, there will be trial and error.

Fortunately, one of the big benefits of cloud when done right (namely agility) is the ability to fail fast, fail often, and try something else.

With alignment and the clear communication required to achieve it:

  • IT can provide solutions and services that add value to the business by meeting its needs, because business is involved in the service definition.
  • LOB stakeholders will have a much better idea of what they’re getting and know it will meet their needs.
  • Finance will understand service costs within a business context to make more informed decisions about how to maximize the budgets and ensure a degree of cost predictability.

If all goes well, the end result is trust and business alignment between the parties.

One final note for IT: you desperately need to take a course in Marketing 101. IT needs to get better at advertising its services and demonstrating its value add so everyone knows what an asset the group is. At VMware, this is something we address explicitly when we help IT customers set up their processes for defining, costing, and offering cloud-based services to their LOB market. Taking a technical service to LOB market is no different than the business taking a service to market. Would they do that without proactive marketing? I don’t think so.

If you found this post helpful, stay tuned for future posts on this topic. Next time, I’ll offer my thoughts on ways to turn IT’s “trust debt” into true business alignment through greater transparency, agility, and technical alignment.

===========
Kevin Lees is Global Principal Architect, Operations Transformation Practice. Follow @VMwareCloudOps on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.

Forensic IT: Discover Issues Before Your End Users Do

by Paul Chapman, VMware Vice President Global Infrastructure and Cloud Operations

If you’ve ever watched five-year-olds playing a soccer game, there is very little strategy: all the kids swarm the field and chase the ball trying to score a goal.

Most IT departments take a similar sort of “swarming” approach to service incidents and problems when they occur.

For most of my career, IT has been a reactive business: we waited until there was a problem and then scrambled very well to solve it. We were tactical in terms of problem solving in a reactive mode, yet monitoring was focused on availability and capturing degradation in services, versus being proactive and predictive, analyzing patterns to stay ahead of problems. In the new world of IT as a service, where expectations are very different, that model no longer works.

New and emerging forensics tools and capabilities give IT the tools to be proactive and predictive—to focus on quality of service and end-user satisfaction, which is a must in the cloud era.

Forensics: A new role for IT
As an example, with new network forensics tools to monitor and analyze network traffic, it may seem a natural fit for network engineers to use them, but at VMware we found the skillsets to be quite different. We need people who have an inquisitive mindset — a sort of “network detective” who thinks like a data analyst and can look at different patterns and diagnostics to find problems before they’re reported or exposed into user impact.

Those in newly created IT forensic roles may have a different set of skills than a typical IT technologist. They may not even be technology subject matter experts, but they may be more like data scientists, who can find patterns and string together clues to find the root of potential problems.

Adding this new type of role in the IT organization most definitely presents challenges as it goes against the way IT has typically been done.  But this shift to a new way of delivering service, moving from the traditional swarm model to a more predictive and forensics-driven model, means a new way of thinking about problem solving. Most importantly, forensics has the potential to create a significant reduction in service impact and maintain high level of service availability and quality.

Quality of service and reducing end user friction
Every time an end user has to stop and depend on another human to fix an IT problem, it’s a friction point. Consumers have come to expect always on, 100 percent uptime, and they don’t want to take the time open a ticket or pause and create a dependency on another human to solve their need. As IT organizations, we need to focus more on the user experience and quality of service—today’s norm of being available 100 percent of the time is table stakes.

With everything connected to the “cloud,” it’s even more important for IT to be proactive and predictive about potential service issues. Applications pull from different systems and processes across the enterprise and across clouds. Without the right analysis tools, IT can’t understand the global user experience and where potential friction points may be occurring. In most experiences, IT finds out about a poor quality of service experience when users complain — perhaps even publicly on their social networks. Unless we get in front of the possible issues and take an outside-in, customer-oriented view, we’re headed for lots of complaints around quality of service.

At VMware, we have seen a significant reduction in overall service impact since using network forensics, and we’re keeping our internal customers productive. Focusing on quality of service and finding people with the right skillsets to fill the associated roles has us unearthing problems long before our end users experience so much as a glitch.

———-
Follow @VMwareCloudOps and @PaulChapmanVM on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.

Aligned Incentives – and Cool, Meaningful New Jobs! – In the Cloud Era

By: Paul Chapman, VMware Vice President Global Infrastructure and Cloud Operations

Transforming IT service delivery in the cloud era means getting all your technical ducks in a row. But those ducks won’t ever fly if your employees do not have aligned incentives.

Incentives to transform have to be aligned from top to bottom – including service delivery strategy, operating model, organizational construct, and individual job functions. Otherwise, you’ll have people in your organization wanting to work against changes that are vital for success, and in some cases almost willing for them to fail.

This can be a significant issue with what I call ‘human middleware.’ It’s that realm of work currently done by skilled employees that is both standard and repeatable at the same time: install a database; install an operating system; configure the database; upgrade the operating system; tune the operating system, etc..

These roles are prime for automation and/or digitization – allowing the same functions to be performed more efficiently, more predictably, game-changingly faster, and giving the IT organization the flexibility it needs to deliver IT as a Service.

Of course, automation also offers people in these roles the chance to move to more meaningful and interesting roles – but therein lies the aligned incentive problem. People who have built their expertise in a particular technology area over an extended period of time are less likely to be incentivized to give that up and transition to doing something ‘different.’

Shifting Roles – A VMware Example

Here’s one example from VMware IT – where building out a complete enterprise SDLC instance for a complex application environment once took 20 people 3-6 weeks.

We saw the opportunity to automate the build process in our private cloud and, indeed, with blueprints, scripting, and automation, what took 20 people 3-6 weeks, now takes 3 people less than 36 hours.

But shifting roles and aligning incentives was also very critical to making this happen.

Here was our perspective: the work of building these environments over and over again was not hugely engaging. Much of it involved coordinating efforts and requesting task work via ticketing systems, but people were also entrenched in their area of expertise and years of gained experience, so they were less inclined to automate their own role in the process. The irony was that in leveraging automation to significantly reduce the human effort and speed up service delivery, we could actually free people up to do more meaningful work – work that in turn would be much more challenging and rewarding for them.

In this case, employees went from doing standard repeatable tasks to high order blueprinting, scripting, and managing and tuning the automation process. In many cases, though, these new roles required new but extensible skills. So in order to help them be successful, we made a key decision: we would actively help (in a step-wise, non-threatening, change-management-focused way) the relevant employees grow their skills. And we’d free them up from their current roles to focus on the “future” skills that were going to be required.

Three New Roles

So there’s the bottom line incentive that can shift employees from undermining a transformation to supporting it: you can say, “yes, your role is changing, but we can help you grow into an even more meaningful role.”

And as automation frees people up and a number of formerly central tasks fall away, interesting new roles do emerge – here, for example, are three new jobs that we now have at VMware:

  •  Blueprint Designer – responsible for designing and architecting blueprints for building the next generation of automated or digitized services.
  •  Automation Engineer – responsible for engineering scripts that will automate or digitize business process and or IT services.
  •  Services Operations Manager – responsible for applications and tenant operation services in the new cloud-operating model.

The Cloud Era of Opportunity

The reality is that being an IT professional has always been highly dynamic. Of the dozen or so different IT positions that I’ve held in my career, the majority don’t exist anymore. Constant change is the steady state in IT.

Change can be uncomfortable, of course. But given its inevitability, we shouldn’t – and can’t – fight it. We should get in front of the change and engineer the transformation for success. And yet too frequently we don’t – often because we’re incented to want to keep things as they are. Indeed, misaligned incentives remain one the biggest impediments to accelerating change in IT.

We can, as IT leaders, shift those incentives, and with them an organization’s cultural comfort with regular change. And given the positives that transformation can bring both the organization and its employees, it’s clear that we should do all we can to make that shift happen.

Major Takeaways:

  • Aligning incentives is a key part of any ITaaS transformation
  • Automation will eliminate some roles, but also create more meaningful roles and opportunities for IT professionals
  • Support, coaching, and communication about new opportunities will help accelerate change
  • Defining a change-management strategy for employee freedom and support for their transition are critical for success

Follow @VMwareCloudOps and @PaulChapmanVM on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.

The Top 10 CloudOps Blogs of 2013

What a year it’s been for the CloudOps team! Since launching the CloudOps blog earlier this year, we’ve published 63 items and have seen a tremendous response from the larger IT and cloud operations community.

Looking back on 2013, we wanted to highlight some of the top performing content and topics from the CloudOps blog this past year:

1. “Workload Assessment for Cloud Migration Part 1: Identifying and Analyzing Your Workloads” by Andy Troup
2. “Automation – The Scripting, Orchestration, and Technology Love Triangle” by Andy Troup
3. “IT Automation Roles Depend on Service Delivery Strategy” by Kurt Milne
4. “Workload Assessment for Cloud Migration, Part 2: Service Portfolio Mapping” by Andy Troup
5. “Tips for Using KPIs to Filter Noise with vCenter Operations Manager” by Michael Steinberg and Pierre Moncassin
6. “Automated Deployment and Testing Big ‘Hairball’ Application Stacks” by Venkat Gopalakrishnan
7. “Rethinking IT for the Cloud, Pt. 1 – Calculating Your Cloud Service Costs” by Khalid Hakim
8. “The Illusion of Unlimited Capacity” by Andy Troup
9. “Transforming IT Services is More Effective with Org Changes” by Kevin Lees
10. “A VMware Perspective on IT as a Service, Part 1: The Journey” by Paul Chapman

As we look forward to 2014, we want to thank you, our readers, for taking the time to follow, share, comment, and react to all of our content. We’ve enjoyed reading your feedback and helping build the conversation around how today’s IT admins can take full advantage of cloud technologies.

From IT automation to patch management to IT-as-a-Service and beyond, we’re looking forward to bringing you even more insights from our VMware CloudOps pros in the New Year. Happy Holidays to all – we’ll see you in 2014!

Follow @VMwareCloudOps on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.

The Case for Upstream Remediation: The Third Pillar of Effective Patch Management for Cloud Computing

By: Pierre Moncassin

Patch Management fulfills an essential function in IT operations: it keeps your multiple software layers up to date, as free of vulnerabilities as possible, and consistent with vendor guidelines.

But scale that to an ever-dynamic environment like a VMware-based cloud infrastructure, and you have an extra challenge on your hands. Not only do the patches keep coming, but end users keep provisioning and amending their configuration. So how to keep track of all these layers of software?

In my experience there are three pillars that need to come together to support effective patch management in the Cloud. The first two, policy and automation, are fairly well established. But I want to make a case for a third: upstream remediation.

As a starting point, you need a solid patching policy. This may sound obvious, but the devil is in the details. Such a policy needs to be defined and agreed across a broad spectrum of stakeholders, starting with the security team. This is typically more of a technical document than a high-level security policy, and it’s far more detailed than, say, a simple rule of thumb (e.g. ‘you must apply the latest patch within X days’).

A well-written policy must account for details such as exceptions (e.g. how to remedy non-compliant configurations); security tiers (which may have different patching requirements); reporting; scheduling of patch deployment, and more.

The second pillar is Automation for Patch Management. While the need for a patching policy is clearly not specific to Cloud Infrastructure, its importance is magnified in an environment where configurations evolve rapidly and automation is pervasive. And such automation would obviously make little sense without a well-defined policy. For this, you can use a tool like VMware’s vCenter Configuration Manager (VCM).

VCM handles three key aspects of patching automation:

  1. Reporting – i.e. verifying patch levels on selected groups of machines
  2. Checking for bulleting updates on vendor sites (e.g. Microsoft)
  3. Applying patches via automated installation

In a nutshell, VCM will automate both the detection and remediation of most patching issues.

However, one other key step is easily overlooked – upstream remediation. In a cloud infrastructure, we want to remediate not just the ‘live’ configurations, but also the templates used for provisioning. This will ensure that the future configurations being provisioned are also compliant. Before the ‘cloud’ era, administrators who identified a patching issue might make a note to update their standard builds in the near future – but there would rarely be a critical urgency. In cloud environments where new machines might be provisioned say, every few seconds, this sort of updates need to happen much faster.

As part of completing any remediation, you also need to be sure to initiate a procedure to carry out updates to your blueprints, as well as to your live workloads (see the simplified process view above).

You need to remember, though, that remediating the images will depend on different criteria from the ‘live’ workload and, depending on the risk, may require a change request and related approval. You need to update the images, test that the updates are working, and then close out the change request.

In sum, this approach reflects a consistent theme across Cloud Operations processes: that the focus of activity is shifted upstream towards the demand side. This also applies to Patch Management: remediation needs to be extended to apply upstream to the provisioning blueprints (i.e. images).

Key takeaways:

  • Policy and automation are two well-understood pillars of patch management;
  • A less well-recognized third pillar is upstream remediation;
  • Upstream remediation addresses the compliance and quality of future configurations;
  • This reflects a common theme in Cloud Ops processes: that focus shifts to the demand side.

Follow @VMwareCloudOps and @Moncassin on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.

5 Key Steps to Optimizing Service Quality for Cloud-Based Services

By: Pierre Moncassin

Freebies can be hard to come by on budget airlines – but I recently received one in the form of a free lesson about designing service quality.

It was a hot day and I was on one of these ‘no-frills’ regional flights. This was obviously a well-run airline. But my overall perception of the service quickly changed after I asked for a glass of water from the attendant – who appeared to be serving refreshments generously to everyone on the flight. The attendant asked for my ticket and declared very publicly that I had the ‘wrong category’ of airfare: no extras allowed – not even a plastic cup filled with plain water.

Looking past the clichés about the headaches of no-frills airline travel, it did offer me a real lesson in service quality. The staff probably met all of their operational metrics – but that wasn’t enough to ensure an overall perception of a minimal acceptable quality. That impression was being impacted by how the service had been designed in the first place.

The same paradox applies directly to cloud services. When discussing newly established cloud services with customers, I often hear that quality is one of their top three concerns. However, quality of service is often equated with meeting specific service levels – what I would call the delivery ‘effort’. I want to argue, though, that you can make all the effort you like and still be perceived as offering poor service, if you don’t design the service right.

Traditional Service – Effort Trumps Architecture

Both budget airlines and cloud-based services are based on a high level of standardization and economies of scale, and consumers are generally very sensitive to price/quality ratios. But if you offer customers a ‘cheap’ product that they regret buying, all of your efforts at driving efficiencies can be wasted. Design, in other words, impacts perception.

So how do you build quality into a cloud service without jacking up the price at the same time? The traditional approach might be to add ‘effort’ – more stringent SLA’s, more operational staff, higher-capacity hardware resources. All of those will help, but they will also ‘gold-plate’ the service more than optimize its design – the equivalent of offering champagne to every passenger on the budget flight.

A Better Way

There is a more efficient approach – one that’s in line with the principles of VMware’s Cloud Operations: build quality upstream, when the service is defined and designed.

Here, then, are five recommendations that can help you Design First for Service Quality:

  1. From the outset, design the service end-to-end. In designing a service, we’re often tempted to focus on a narrow set of immediately important metrics (which might also be the easiest to measure) and ignore the broader perspective. But in the eyes of a consumer, quality hardly ever rests on a single metric. As you plan your initial design, combine ‘hard’ metrics (e.g. availability) with ‘soft’ metrics (e.g. customer surveys) that are likely to impact customer satisfaction down the line.
  2. Map your service dependencies. One common challenge with building quality in cloud services is that cloud infrastructure teams typically lack visibility into which part of the infrastructure delivers which part of the end user service. You can address this with mapping tools like VMware’s vCenter Infrastructure Navigator (part of the vCenter Operations Management Suite).
  3. Leverage key business-focused roles in your Cloud Center of Excellence. Designing a quality service requires close cooperation between a number of roles, including the Customer Relationship Manager, Service Owner, Service Portfolio Manager, and Service Architect (more on those roles here). In my view, Service Architects are especially key to building quality into the newly designed services, thanks to their ‘hybrid’ position between the business requirements and the technology. They’re uniquely able to evaluate the trade-offs between costs (i.e. infrastructure side) and perceived quality (business side). To go back to my airline, a good Service Architect might have decided at the design stage that a free glass of tap water is very much worth offering to ‘economy’ passengers (while Champagne, alas, is probably not).
  4. Plan for exceptions. As services are increasingly standardized and offered directly to consumers (for example, via VMware vCAC for self-provisioning), you’ll face an increasing need to handle exceptions. Perception of quality can be dramatically changed by how such user exceptions are handled. Exception handling can be built into the design, for example, via automated workflows (see this earlier blog about re-startable workflows); but also via automated interfaces with the service desk.
  5. Foster a true service culture. One major reason to setup a Cloud Center of Excellence as recommended by VMware Cloud Operations is to build a team totally dedicated to delivering high-quality services to the business. For many organizations, that requires a cultural change – moving to a truly consumer-centric perspective. From a practical point of view, the cultural change is primarily a mission for the Cloud Leader who might, for example, want to set up frequent exchanges between the other Tenant Operations roles and lines of business.

In conclusion, designing quality in cloud services relies on a precise alignment between people (organization), processes, and technologies – and on ensuring that alignment from the very start.

Of course, that’s exactly the ethos of Cloud Operations, which shifts emphasis from effort at run time (less significant, because of automation) to effort at design time (only needs to be done once). But that shift, it’s important to remember, is only possible with a cultural change.

Key Takeaways:

  • Service quality is impacted by your initial design;
  • Greater delivery effort might make up for design issues, but this is an expensive way to ‘fix’ a service after the fact;
  • A Cloud Ops approach lets you design first for service quality;
  • Follow our recommended steps for optimizing service quality;
  • Never under-estimate the cultural change required to make the transition.

Follow @VMwareCloudOps and @Moncassin on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.

 

Workload Assessment for Cloud Migration, Part 1: Identifying and Analyzing Your Workloads

By: Andy Troup

Conducting a thorough workload analysis can make or break the success of a cloud strategy.

If you are successful with assessing workloads and placing them in the appropriate private, hybrid and public cloud environments, then this will help you fulfill your cloud strategy, thus helping you enable greater agility and cost efficiency. If your assessment is unsuccessful, then these benefits will be much harder to achieve and you could see higher costs, lower performance and unhappy customers.  Remember, success breeds success, so if you have happy customers who are realizing the benefits of your cloud implementation, others will be knocking at your door. If you are unsuccessful, the pipeline of customers will very rapidly dry up.

In this four-part series, I’ll explain four main considerations that you should examine when performing a workload assessment. In this blog, I’ll suggest a framework to use to classify workloads as potential candidates for moving to a cloud environment. My next three blog posts in this series will cover service portfolio mapping, analyzing the cost and benefits of moving to the cloud, and last but not least, stakeholder analysis.

Common Questions

When assessing workloads to identify candidates, I often find myself asking:

  • What criteria should be considered when determining what workloads are a good fit for a new cloud environment?
  • What is the best way to capture and evaluate the criteria with minimal effort and impact on a busy IT department?

A thoughtful and efficient workload assessment framework can simplify and streamline the analysis. Without the right methodology, it can be difficult to know where to start, let alone where to finish. The larger the number of workloads, the more complex the prioritization task becomes.

Here are common considerations and requirements that factor into a potential migration:

Business Impact:

  1. Take a look at the workload and evaluate its impact on your business. Is it a business critical workload? How does it affect and impact your company? Take the answer to this question and assess it against where you are on your cloud journey. You wouldn’t want to move mission critical workloads in to your cloud during your first days after “go live” would you?
  2. For which application lifecycle phase will the workload be used (for example, development, test or production)? What are the different requirements for each environment?

Application Architecture:

  1. Is the application written for cloud environment? If not, make sure you understand the impact of migrating it into the cloud.
  2. How hard/expensive is it to refactor the application for new environment e.g. do you need to remove hard coded resource paths? What are the scaling considerations, can you already horizontally scale to add capacity by adding instances or can you only scaling up by adding more resource to a single instance?

Technical Aspects:

  1. What operating systems, databases or application servers are being consumed or provided and how hard will it be to also migrate them into the cloud?
  2. Do your database, application server and web server run on the same type of platform?
  1. What quantity of CPU, memory, network and storage are typically used/needed? Can your cloud implementation support this?
  2. What commercial and custom software support the workload?
  3. What are the dependencies or integration touch points with other workloads?

Non-Functional Requirements:

  1. What are the required service levels, performance, capacity, transaction rates and response time? Again, can your cloud implementation support this?
  2. What are the supporting service requirements?  Backup, HA/DR, security or performance monitoring?  Are specific monitoring or security agents required?
  3. Are there encryption, isolation or other types of security and regulatory compliance requirements?

Support & Costs:

  1. What are the support resources and cost for a given workload? For example, two full-time equivalent employees per server – how much does this resource cost?  Also, don’t forget licensing, how does the software vendor deal with cloud implementations of their software and what are the cost implications?
  2. What are the operational costs for space, power, cooling and so on? What will be saved by migration?

One thing remains through all of this – the benefits of moving these workloads must always outweigh the costs and the risks.

To get started on the journey of migrating your workloads to the cloud, remember these takeaways:

  • Always think about how your workload directly affects your company. With a thorough review of each of your workloads, you’ll know what changes to anticipate when you begin the migration process.
  • Make sure you’re thinking in the cloud mindset. Before beginning the migration process, make sure your applications are cloud-ready. If they aren’t already, make sure you have the proper strategy in place to bring them up to cloud-ready speed.
  • Be prepared. Not only do your employees need to know about these changes, but make sure your cloud implementation is prepared for the capacity (including cost) it will take your company to migrate to the cloud.

Check out our list of great blogs on workload migration and stay tuned for Part 2 of this series, where we’ll look at service portfolio mapping and how to determine the target cloud service and deployment model for each candidate workload.

Follow @VMwareCloudOps on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.

Transforming IT Services is More Effective with Org Changes

By: Kevin Lees

Last time, I wrote about the challenge of transforming a traditional IT Ops culture and the value of knowing what you’re up against.

Now I want to suggest some specific organizational changes that – given those cultural barriers – will help you successfully undertake your transformation.

At the heart of the model I’m suggesting is the notion of a Cloud Infrastructure Operation Center of Excellence. What’s key is that it can be adopted even when your org is still grouped into traditional functional silos. 

Aspiration Drives Excellence

A Cloud Infrastructure Operation Center of Excellence is a virtual team comprised of the people occupying your IT org’s core cloud-focused roles: the cloud architect, cloud analyst, cloud developers and cloud administrators. They understand what it means to configure a cloud environment, and how to operate and proactively monitor one. They’re able to identify potential issues and fix them before they impact the service.

Starting out, each of these people can still be based in the existing silos that have grown up within the organization. Initially, you are just identifying specific champions to become virtual members of the Center of Excellence. But they are a team, interacting and meeting on a regular basis, so that from the very beginning they know what’s coming down the pipe in terms of increased capacity or capability of the cloud infrastructure itself, as opposed to demands for individual projects.

Just putting them together isn’t enough, though. We’ve found that it’s essential to make membership of the cloud team an aspirational goal for people within the IT organization. It needs to be a group that people want to be good enough to join and for which they are willing improve their skills. Working with the cloud team needs to be the newest, greatest thing.

Then, as cloud becomes more prominent and the defacto way things are done, the Cloud Center of Excellence can expand and start absorbing pieces of the other functional teams. Eventually, you’ll have broken down the silos, the Cloud Center of Excellence will be the norm for IT, and everybody will be working together as an integrated unit.

Four Steps to Success

Here are four steps that can help ensure that your Cloud Infrastructure Operation Center of Excellence rollout is a success:

Step 1 – Get executive sponsorship

You need an enthusiastic, proactive executive sponsor for this kind of change.  Indeed, that’s your number one get – there has to be an executive involved who completely embraces this idea and the change it requires, and who’s committed to proactively supporting you.

Step 2 – Identify your team  

Next you need to identify the right individuals within the organization to join your Center of Excellence. IT organizations that go to cloud invariably already run a virtualized environment, which means they already employ people who are focused on virtualization. That’s a great starting point for identifying individuals who are best qualified to form the nucleus of this Center. So ask: Who from your existing virtualization team are the best candidates to start picking up responsibility for the cloud software that gets layered on top of the virtualized base?

Step 3 – Identify the key functional teams that your cloud team should interact with.

This is typically pretty easy because your cloud team has been interacting with these functional teams in the context of virtualization. But you need to formalize the conneciton and identify a champion within each of these functional teams to become a virtual member of the Center of Excellence. Very importantly, to make that work, the membership has to be part of that person’s job description. That’s a key piece that’s often missed: it can’t just be on top of their day job, or it will never happen. They have to be directly incentivized to make this successful.

Step 4 – Sell the idea

Your next step is basically marketing. The Center of Excellence and those functional team champions must now turn externally within IT and start educating everybody else – being very transparent about what they’re doing, how it has impacted them, how it will impact others within IT and how it can be a positive change for all. You can do brown bag lunches, or webinars that can be recorded and then downloaded and watched, but you need some kind of communication and marketing effort to start educating the others within IT on the new way of doing things, how it’s been successful, and why it’s good for IT in general to start shifting their mindset to this service orientation.

Don’t Forget Tenant Operations 

There’s one last action you need to put in place to really complete your service orientation: create a team that is exclusively focused outwards toward your IT end customers. It’s what we call Cloud Tenant Operations.

Tennant Ops is one of three Ops tiers that enable effective operations in the cloud era. It is also called “Service Ops,” which is one of three Ops tiers outlined here and here.

One of the most important roles in this team is the customer relationship (or sometimes ‘collaboration’) manager who is directly responsible for working with the lines of business, understanding their goals and needs, and staying in regular contact with them, almost like a salesperson, and supporting that line of business in their on-boarding to, and use of, the cloud environment.

They can also provide demand information back to the Center of Excellence to help with forward capacity planning, helping the cloud team stay ahead of the demand curve by making sure they have the infrastructure in place when the lines of business need it.

Tenant Operations is really the counterpart to the Cloud Infrastructure Operation Center of Excellence from a service perspective – it needs to comprise of someone who owns the services offered out to the end customers over their life cycle, a service architect and service developers who actually can understand the technical implications of the requirements. These requirements are coming from multiple sources, so the team needs to identify the common virtual applications that can be offered out and consumed by multiple organizations (and teams within organizations) as opposed to doing custom one-off virtual application development.

In a sense, Tenant Operations function as the dev ops team from a cloud service perspective and really instantiate the concept of a service mindset, becoming the face to the external end users of the cloud environment.

These Changes are Doable

The bottom line here: transforming IT Ops is doable. I have worked with many IT organizations that are successfully making these changes. You can do it too.

Additional Resources

For a comprehensive look at how to best make the transition to a service-oriented cloud infrastructure, check out Kevin’s white paper, Organizing for the Cloud. 

Also look for VMware Cloud Ops Journey study findings later this month, which highlights common operations capability changes, and the drivers for those changes. For future updates, follow us on Twitter at @VMwareCloudOps, and join the conversation by using the #CloudOps and #SDDC hashtags.