Home > Blogs > VMware CloudOps

Workload Assessment for Cloud Migration, Part 1: Identifying and Analyzing Your Workloads

By: Andy Troup

Conducting a thorough workload analysis can make or break the success of a cloud strategy.

If you are successful with assessing workloads and placing them in the appropriate private, hybrid and public cloud environments, then this will help you fulfill your cloud strategy, thus helping you enable greater agility and cost efficiency. If your assessment is unsuccessful, then these benefits will be much harder to achieve and you could see higher costs, lower performance and unhappy customers.  Remember, success breeds success, so if you have happy customers who are realizing the benefits of your cloud implementation, others will be knocking at your door. If you are unsuccessful, the pipeline of customers will very rapidly dry up.

In this four-part series, I’ll explain four main considerations that you should examine when performing a workload assessment. In this blog, I’ll suggest a framework to use to classify workloads as potential candidates for moving to a cloud environment. My next three blog posts in this series will cover service portfolio mapping, analyzing the cost and benefits of moving to the cloud, and last but not least, stakeholder analysis.

Common Questions

When assessing workloads to identify candidates, I often find myself asking:

  • What criteria should be considered when determining what workloads are a good fit for a new cloud environment?
  • What is the best way to capture and evaluate the criteria with minimal effort and impact on a busy IT department?

A thoughtful and efficient workload assessment framework can simplify and streamline the analysis. Without the right methodology, it can be difficult to know where to start, let alone where to finish. The larger the number of workloads, the more complex the prioritization task becomes.

Here are common considerations and requirements that factor into a potential migration:

Business Impact:

  1. Take a look at the workload and evaluate its impact on your business. Is it a business critical workload? How does it affect and impact your company? Take the answer to this question and assess it against where you are on your cloud journey. You wouldn’t want to move mission critical workloads in to your cloud during your first days after “go live” would you?
  2. For which application lifecycle phase will the workload be used (for example, development, test or production)? What are the different requirements for each environment?

Application Architecture:

  1. Is the application written for cloud environment? If not, make sure you understand the impact of migrating it into the cloud.
  2. How hard/expensive is it to refactor the application for new environment e.g. do you need to remove hard coded resource paths? What are the scaling considerations, can you already horizontally scale to add capacity by adding instances or can you only scaling up by adding more resource to a single instance?

Technical Aspects:

  1. What operating systems, databases or application servers are being consumed or provided and how hard will it be to also migrate them into the cloud?
  2. Do your database, application server and web server run on the same type of platform?
  1. What quantity of CPU, memory, network and storage are typically used/needed? Can your cloud implementation support this?
  2. What commercial and custom software support the workload?
  3. What are the dependencies or integration touch points with other workloads?

Non-Functional Requirements:

  1. What are the required service levels, performance, capacity, transaction rates and response time? Again, can your cloud implementation support this?
  2. What are the supporting service requirements?  Backup, HA/DR, security or performance monitoring?  Are specific monitoring or security agents required?
  3. Are there encryption, isolation or other types of security and regulatory compliance requirements?

Support & Costs:

  1. What are the support resources and cost for a given workload? For example, two full-time equivalent employees per server – how much does this resource cost?  Also, don’t forget licensing, how does the software vendor deal with cloud implementations of their software and what are the cost implications?
  2. What are the operational costs for space, power, cooling and so on? What will be saved by migration?

One thing remains through all of this – the benefits of moving these workloads must always outweigh the costs and the risks.

To get started on the journey of migrating your workloads to the cloud, remember these takeaways:

  • Always think about how your workload directly affects your company. With a thorough review of each of your workloads, you’ll know what changes to anticipate when you begin the migration process.
  • Make sure you’re thinking in the cloud mindset. Before beginning the migration process, make sure your applications are cloud-ready. If they aren’t already, make sure you have the proper strategy in place to bring them up to cloud-ready speed.
  • Be prepared. Not only do your employees need to know about these changes, but make sure your cloud implementation is prepared for the capacity (including cost) it will take your company to migrate to the cloud.

Check out our list of great blogs on workload migration and stay tuned for Part 2 of this series, where we’ll look at service portfolio mapping and how to determine the target cloud service and deployment model for each candidate workload.

Follow @VMwareCloudOps on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.

DevOps and All The Other “Ops Religions”

By: Kurt Milne

I didn’t wake up yesterday thinking, “Today I’ll design a T-shirt for the DevOps Days event in Mountain View.”  But as it turns out – that is what happened.

Some thoughts on what went into my word cloud design:

1. DevOps is great. This will be my 4th year attending DevOps Days.  I get the organic, bottoms up nature of the “movement.” I’ve been on the receiving end of the “throw it over the wall” scenario. A culture of collaboration and understanding go a long way to address the shortcomings of swim lane diagrams, phase gate requirements and mismatch of incentives that hamper effective app lifecycle execution. Continuous deployment is inspirational, and the creativity and power of the DevOps tool chain is very cool.

2. EnterpriseOps is still a mighty force. I remember an EnterpriseOps panel discussion at DevOps Days 2010. The general disdain for ITIL, coming from a crowd that was high off of 2 days of Web App goodness at Velocity 2010, was palpable. The participant from heavy equipment manufacturer Caterpillar asked the audience to raise their hand if they had an IT budget of more than $100M. No hands went up in the startup-dominated audience. His reply – “We have a $100M annual spend with multiple vendors.” The awkward silence suggested that EnterpriseOps is a different beast. It was. It still is. There is a lot EnterpriseOps can learn from DevOps, but the problems dealing with massive scale and legacy are just different.

3. InfraOps, AppOps, Service Ops. This model developed by James Urquhart makes sense to me.  It especially makes sense in the era of Shape Shifting Killer Apps. We need a multi-tier model that addresses the challenges of running infrastructure (yes, even in the cloud era), the challenges of keeping the lights on behind the API in a distribute component SOA environment and the cool development techniques that shift uptime responsibility to developers, as pioneered by Netflix. Clear division of labor with separation of duties, and a bright light shining on the white space in between, is a model that seems to address the needs of every cloud era constituent.

4. Missing from this 3-tier model is ConsumerOps. Oops. Too late to update the shirt design. Many are consuming IT services offered by cloud service providers; there must be a set of Ops practices that help guide cloud consumption. Understanding and negotiating cloud vendor SLAs and architecting multiple AWS availability zones immediately come to mind. Being a service broker and including 3rd party cloud services as part of an integrate service catalog is another.

5. Tenant Ops. As far as I can tell, this term was coined by Kevin Lees and the Cloud Operations Transformation services team at VMware. See pages 17 and 21 in Kevin’s paper on Organizing for the Cloud. It includes customer relationship management, service governance, design and release, as well as ongoing management of services in a multi-tenant environment. VMware internal IT uses the term to describe what they do running our private cloud internally. They have a pie chart that shows the percentage of compute units allocated to different tenants (development, marketing, sales, customer support, etc). It works. It may be similar to ServiceOps in the three tier model, but feels different enough, with a focus on multi-tenancy and not API driven services, to deserves its own term.

6. Finally CloudOps. This term is meta. It encompasses many of the concepts and practices of all the others. This is a term that describes IT Operations in the Cloud Era. Not just in a cloud, or connected to a cloud. But in the cloud era. The distinction being that the “cloud era” is different than the “client server era,” and implies that many practices developed in the previous era no longer apply. Many still do. But dynamic service delivery models are a forcing function for operational change. That change is happening in five pillars of cloud ops: People, Process, Organization, Governance, and IT business.

So while some of the sessions at this year’s DevOps conference are focused on continuous deployment. I’d bet that all the topics of the “Ops religions” will be covered.  Hence the focus on the term CloudOps.

We’ll be live tweeting from DevOps next Friday. Follow us @VMwareCloudOps or join the discussion using the #CloudOps hashtag.

Consider joining the new VMUG CloudOps SIG or find out more about it during VMUG June 27th webcast.

Reaching Common Ground When Defining Services – Highlights from #CloudOpsChat

On May 30th, we hosted our monthly #CloudOpsChat on “Reaching Common Ground When Defining Services.” Thanks to all who participated for making it an informative and engaging conversation. We would also like to thank John Dixon (@GreenPagesIT) from GreenPages and Khalid Hakim from VMware (@KhalidHakim47) for co-hosting the chat with us.

To kick off the chat we asked, “What exactly is an IT service?”

Our co-host @KhalidHakim47 suggested they are intangible by nature, unlike products. Our other co-host, @GreenPagesIT, gave the textbook answer: IT services are an asset worthy of investment. He added that an application alone is not an IT service. @kurtmilne defined an IT service as something designed to deliver something to someone in a form or function that meets their need. @AngeloLuciani said that an IT service delivers a business outcome. @KongYang saw it as a bounded deliverable that states which things are being provided by whom and the support that’s to be rendered when things fail.

Next we asked, “Why should you define services in the first place?” Followed by, “What are the benefits of doing so for your users?”

@KhalidHakim47 started off by saying that you cannot claim you manage services until they are defined in the first place. @kurtmilne said service definitions set expectations, which are a key dependency for creating satisfied users. @jfrappier added to Khalid’s point, saying that you also can’t control your public cloud vendors, so as a consumer you need clear definitions. Khalid went on to say that without a service definition, the boundaries may be loose between IT deliverables – setting expectations becomes much clearer when you address a well-defined service. @harrowandy chipped in saying the definition of services helps to make sure that the customer and IT are expecting the same outcome, with which @alamo_jose agreed. Co-host @GreenPagesIT said IT services help to organize people around a delivery objective instead of a technology objective.

We then noted that multiple roles contribute to specifying a service definition and asked, “What roles are involved in defining each service?”

@KhalidHakim47 argued that the driving and accountable role for defining a service is the service owner/manager, but it is not a one-man show. According to Khalid, @CloudOpsVoice and @alamo_jose, some of the key roles involved include the Business Unit Liaison, IT Service Manager, Consumer Relationship Manager, Portfolio/Catalog Manager and Architect, the Service Liaison Manager and Service Catalog Manager. Co-host @GreenpagesIT explained that at first pass, it’s a small group that defines the service, but eventually more parties become involved as you roll into CSI. @harrowandy said the service must have an owner who takes the service from cradle to grave and from initiation to retirement.

We then asked our audience, “Are there recommended approaches to getting multiple groups of users to reach consensus in their service definition?”

@AngeloLuciani explained that groups need to be driven by the business strategy and outcomes. @harrowandy agreed, adding that if groups don’t know the business strategy, how can IT provide them what they want? Co-host @KhalidHakim47 suggested that during the service definition planning phase, all roles that are expected should be looped into the exercise with clear goals and outcomes. @KongYang made a great analogy, saying too many chefs in the kitchen will kill the service – instead, we should look to have one chef for one service, a point with which many of our participants agreed.

Next, co-host @GreenPagesIT wondered: “Are there recommended approaches to balancing the needs of both IT and service consumers?”

@kurtmilne said that IT can deliver fast and cheap if standardized, but slow and expensive if customized. Agreeing,  @KhalidHakim47 said there’s a balancing act between packaging/standardizing and customizing. @harrowandy suggested using the “80/20” rule: You can get 80% of what you want now, or wait a certain number of weeks for the remaining 20%. Kurt also brought up the fact that IT service standardization gives users more flexibility and business process level, with which @alamo_jose agreed, adding that IT must help the business understand that reality. Co-host @KhalidHakim47 noted that standardization drives efficiency, but allowing more service levels gives more freedom as well. Co-host @GreenPagesIT added that requirements should be negotiated during the service definition and not specs.

Switching gears, we then asked “What service components do you think should be included in a service definition?”

@kurtmilne stated that pricing services is key – pricing requires accurate costing, and costing requires clear service definition, thus making the whole process come full circle. @alamo_jose added that ownership, SLA/OLA, a clear definition, features, cost and related services should all be included. Co-host @GreenPagesIT said that knowledge of how to access the service is a necessary service component, as well as hours of operation.

To round off the chat we closed with the question, “What do you do after you define services? What are the next steps?”

For @jfrappier, the answer was, “IT needs to define, then document and automate.” @alamo_jose chipped in, saying that once the service is defined, it should be published in the Service Catalog, with @AngeloLuciani adding that IT also needs to educate and communicate on how to leverage the services. @ckulchar, however, had a very different answer – once services are defined and delivered, he suggested, users should drink beer and celebrate!

Thanks again to everybody who participated in our #CloudOpsChat, and stay tuned details around our next #CloudOpsChat!

Feel free to tweet us at @VMwareCloudOps with any questions or feedback, and join the conversation by using the #CloudOps and #SDDC hashtags.

Transforming IT Services is More Effective with Org Changes

By: Kevin Lees

Last time, I wrote about the challenge of transforming a traditional IT Ops culture and the value of knowing what you’re up against.

Now I want to suggest some specific organizational changes that – given those cultural barriers – will help you successfully undertake your transformation.

At the heart of the model I’m suggesting is the notion of a Cloud Infrastructure Operation Center of Excellence. What’s key is that it can be adopted even when your org is still grouped into traditional functional silos. 

Aspiration Drives Excellence

A Cloud Infrastructure Operation Center of Excellence is a virtual team comprised of the people occupying your IT org’s core cloud-focused roles: the cloud architect, cloud analyst, cloud developers and cloud administrators. They understand what it means to configure a cloud environment, and how to operate and proactively monitor one. They’re able to identify potential issues and fix them before they impact the service.

Starting out, each of these people can still be based in the existing silos that have grown up within the organization. Initially, you are just identifying specific champions to become virtual members of the Center of Excellence. But they are a team, interacting and meeting on a regular basis, so that from the very beginning they know what’s coming down the pipe in terms of increased capacity or capability of the cloud infrastructure itself, as opposed to demands for individual projects.

Just putting them together isn’t enough, though. We’ve found that it’s essential to make membership of the cloud team an aspirational goal for people within the IT organization. It needs to be a group that people want to be good enough to join and for which they are willing improve their skills. Working with the cloud team needs to be the newest, greatest thing.

Then, as cloud becomes more prominent and the defacto way things are done, the Cloud Center of Excellence can expand and start absorbing pieces of the other functional teams. Eventually, you’ll have broken down the silos, the Cloud Center of Excellence will be the norm for IT, and everybody will be working together as an integrated unit.

Four Steps to Success

Here are four steps that can help ensure that your Cloud Infrastructure Operation Center of Excellence rollout is a success:

Step 1 – Get executive sponsorship

You need an enthusiastic, proactive executive sponsor for this kind of change.  Indeed, that’s your number one get – there has to be an executive involved who completely embraces this idea and the change it requires, and who’s committed to proactively supporting you.

Step 2 – Identify your team  

Next you need to identify the right individuals within the organization to join your Center of Excellence. IT organizations that go to cloud invariably already run a virtualized environment, which means they already employ people who are focused on virtualization. That’s a great starting point for identifying individuals who are best qualified to form the nucleus of this Center. So ask: Who from your existing virtualization team are the best candidates to start picking up responsibility for the cloud software that gets layered on top of the virtualized base?

Step 3 – Identify the key functional teams that your cloud team should interact with.

This is typically pretty easy because your cloud team has been interacting with these functional teams in the context of virtualization. But you need to formalize the conneciton and identify a champion within each of these functional teams to become a virtual member of the Center of Excellence. Very importantly, to make that work, the membership has to be part of that person’s job description. That’s a key piece that’s often missed: it can’t just be on top of their day job, or it will never happen. They have to be directly incentivized to make this successful.

Step 4 – Sell the idea

Your next step is basically marketing. The Center of Excellence and those functional team champions must now turn externally within IT and start educating everybody else – being very transparent about what they’re doing, how it has impacted them, how it will impact others within IT and how it can be a positive change for all. You can do brown bag lunches, or webinars that can be recorded and then downloaded and watched, but you need some kind of communication and marketing effort to start educating the others within IT on the new way of doing things, how it’s been successful, and why it’s good for IT in general to start shifting their mindset to this service orientation.

Don’t Forget Tenant Operations 

There’s one last action you need to put in place to really complete your service orientation: create a team that is exclusively focused outwards toward your IT end customers. It’s what we call Cloud Tenant Operations.

Tennant Ops is one of three Ops tiers that enable effective operations in the cloud era. It is also called “Service Ops,” which is one of three Ops tiers outlined here and here.

One of the most important roles in this team is the customer relationship (or sometimes ‘collaboration’) manager who is directly responsible for working with the lines of business, understanding their goals and needs, and staying in regular contact with them, almost like a salesperson, and supporting that line of business in their on-boarding to, and use of, the cloud environment.

They can also provide demand information back to the Center of Excellence to help with forward capacity planning, helping the cloud team stay ahead of the demand curve by making sure they have the infrastructure in place when the lines of business need it.

Tenant Operations is really the counterpart to the Cloud Infrastructure Operation Center of Excellence from a service perspective – it needs to comprise of someone who owns the services offered out to the end customers over their life cycle, a service architect and service developers who actually can understand the technical implications of the requirements. These requirements are coming from multiple sources, so the team needs to identify the common virtual applications that can be offered out and consumed by multiple organizations (and teams within organizations) as opposed to doing custom one-off virtual application development.

In a sense, Tenant Operations function as the dev ops team from a cloud service perspective and really instantiate the concept of a service mindset, becoming the face to the external end users of the cloud environment.

These Changes are Doable

The bottom line here: transforming IT Ops is doable. I have worked with many IT organizations that are successfully making these changes. You can do it too.

Additional Resources

For a comprehensive look at how to best make the transition to a service-oriented cloud infrastructure, check out Kevin’s white paper, Organizing for the Cloud. 

Also look for VMware Cloud Ops Journey study findings later this month, which highlights common operations capability changes, and the drivers for those changes. For future updates, follow us on Twitter at @VMwareCloudOps, and join the conversation by using the #CloudOps and #SDDC hashtags.

Transforming IT Services Starts With a Culture Shift

By: Kevin Lees

It’s happening. In place of their traditional, project- and technology-based approach, IT organizations really are making the shift to deliver IT as a service.

My last post examined what an IT service looks like in practice. But what if you’ve only gone as far as deciding that you need to transform IT? How do you act on that decision?

Your first priority, I’d argue, is to understand how functional silos create an anchor for your organization’s culture, and how that may be your biggest barrier to change. That’s what I’ll be looking at here. In part 2, I’ll suggest a solution for specific organizational changes that address the culture shift problem.

Changing Minds to Change Behavior

For context, here’s the IT model you’re leaving behind: a project request comes in with specific technology or capacity requirements. You procure the infrastructure and build a custom environment and then turn that over to the development team (which is often really a back and forth affair between Dev and Ops, where the final solution doesn’t really look like the initial request). When the new capability is moved into production, you take over the management and maintenance of that application and underlying infrastructure environment.

Here’s where you’re going: well before you get any requests, you build an environment that can be reused across many different development teams. You deliver that environment as a highly standardized service that’s a best fit for all the teams you serve. They request and deploy on demand with little or no IT Ops involvement in the deployment. Developers can customize their deployment to some degree, by selecting from a small set of highly standardized service options or configuration choices.

Leaving the one behind and moving to the other requires new software tools, as well as hardware that can handle the demands of a pooled resource environment. But the real transformation is a shift in mindset. And it’s one that can be hugely challenging for an IT group to both make initially and sustain over time.

I’ve seen this at many IT groups I work directly with. The fact that “It’s just not the way we’ve done things in the past” in itself becomes the obstacle to change.

Breaking Structural Bonds

Team A, for example, has always done their thing and then handed it off to team B who does their thing, who hands it off to the next team. Even with carefully crafted swim lane diagrams, phase gate checklists, and continuous process improvement – it can literally take months to deploy an environment for a development team.

Over time, large IT organizations build a series of silos that  develop deep expertise to facilitate that process: a network silo, a security silo, a storage silo, and so on.  They optimize the steps and sub-optimize the process.

But you’re now looking to move to a situation where everyone works in a much more integrated way: together and not sequentially. After all, with a cloud services-oriented operation, things happen so fast and in such an integrated way that trying to work within the context of these silos and linear processes does nothing but slow the process down, which defeats the whole purpose of making the change.

So for change to happen, the silos have to go.

Fear, Uncertainty . . .  a Plan

Propose ditching silos, though, and people immediately start fearing for their own job security. They won’t know what it will take to do well anymore – deepening expertise was a well worn path to recognition, certifications and a raise. Talk of breaking down this structure conjures in them that awful trinity: fear, uncertainty, doubt.

It’s an understandable reaction and it’s important to anticipate and plan for. But you now know 1) what you want and 2) what you’re up against. You’re ahead of the game.

It is time to own the problem!

In my next blog post, I’ll outline a concrete set of actions that will help you successfully change your organizational culture – reengingeering your Ops team to dynamically deliver services to end customers through a cloud infrastructure.

For future updates, be sure to follow @VMwareCloudOps on Twitter and use the #CloudOps and #SDDChashtags to join the conversation.

Additional Resources

View Kevin Lees webcast 5 Key Steps to Effective IT Ops in a Hybrid World for more information about specific changes that can help IT be more service-oriented.

The Illusion of Unlimited Capacity

By: Andy Troup 

I was at a customer workshop last week, and I used a phrase that I’ve used a few times to describe one of the crucial capabilities of a successful cloud computing service, namely “The Illusion of Unlimited Capacity.” It got a bit of a reaction, and people seemed to understand the concept quite easily. So apart from its sounding quite cool (maybe I should get out more), why do I keep on using this term?

Well, in cloud computing, we all know that there is no such thing as unlimited capacity – everything is finite. Every cloud provider only has a limited number of servers, a limited amount of storage capacity, and a limited number of virtual and physical network ports – you get the idea, it’s all limited, right?

Paradoxically, though, providers of cloud resources have to make sure their customers believe the opposite: that there is no end to what can be consumed.

The National Institute of Standards and Technology (NIST) defines one of the characteristics of cloud computing as on-demand self-service; i.e. the user can consume what they want, when they want it. Now, for cloud providers to provide on-demand self-service, they need to be confident that they can fulfill all the requests coming from all their consumers, immediately. They need to maintain, in other words, an illusion of unlimited capacity.

If at any point a consumer makes a request, and the cloud portal they use responds with a “NO” because it’s run out of cloud resources, this illusion has gone. That has real consequences. As it is very easy for consumers to move between cloud providers, it’s very likely that the provider will have lost them as customers and will find it very hard to get them back. Remember, even for internal IT cloud providers, it’s a competitive market place and the customer is king.

So, when defining your cloud strategy, you want to make sure that maintaining ‘the illusion of unlimited capacity’ is on your list. It may not be something you need to consider initially, but when demand for your services increases, you need to be ready to deal with the challenge. To prepare for it, here are 5 things you should start thinking about:

  • Understand your customers – build a strong relationship with your customers, understand their business plans, and use this information to understand the impact those plans will have on the demand for your cloud services.
  • Implement the appropriate tooling – so you can not only understand demand for your cloud capacity today, but also forecast future demand.
  • Consider the Hybrid Cloud – think about how you would burst services in and out of a hybrid cloud and when you would need to do it. Before you actually need to do this, make sure you plan, prepare and automate (where possible), so that everything is in place when it’s needed. Don’t wait until it’s too late.
  • Train users on service consumption etiquette – if they know they can get what they need when they need it, they will be less inclined to hoard resources. And if they aren’t hoarding resources, the pressure to predict their future demand (which can be difficult) will be reduced, because resources are being used more efficiently. Why not agree that they won’t have to plan capacity if they “turn it off” when done, thus freeing resources back to the pool and further increasing spare capacity.
  • Kill zombie workloads – be aware of services that aren’t being used and turn them off (after having a conversation with the customer). Also, encourage the use of leases for temporary services when appropriate.

Finally, going back to the essential characteristics of cloud computing as defined by the National Institute of Standards and Technology (NIST) (here is the very short document for those of you that haven’t read it), one other characteristic is rapid elasticity.

If you think about it, this article is really all about rapid elasticity. It’s just another way of saying that you need to maintain the illusion of unlimited capacity. Now, put on your top hat, hold on to your magic wand, and keep the illusion going.

For future updates, follow @VMwareCloudOps on Twitter and join the conversation using the #CloudOps and #SDDC hashtags.

What Do We Mean by IT Services in the Cloud Era?

By Kevin Lees

You hear it all the time from cloud evangelists: instead of delivering based on projects, IT should now be delivering around a common set of services.

It’s not a new idea—but cloud computing promises to finally make it a reality.

Before we get too excited, though, we should ask: what do we actually mean by cloud services? That’s not something cloud advocates always make clear.

So here’s an example:

The other week I was talking with a customer who runs a cloud that supports production dev test environments for a  government agency. These environments are in turn supporting mission-critical applications that play a major role in maintaining the public’s health.

From a service perspective, the tenant ops team is identifying and building a set of common development platforms as virtual applications. In this case each platform consists of three tiers, with each tier running a Windows operating system that’s been pre-built to meet government security policies. The composite platforms all have monitoring drivers already installed, and also feature commonly-used development environments – in this case they’re either a Microsoft dot-net type environment or Java-based.

Collectively, that creates a common virtual dev test vApp pre-built with a lot of the core capabilities and requirements to do this type of mission-critical application development. My customer’s team is then offering this multi-tier stack as a “service” via self-service on demand provisioning.

In the past, it could have taken two to three months to stand up something like this for a new round of development and testing. Now, with these prepackaged, common services, a new development environment can be deployed in less than an hour..

It’s a great example of how quickly you can provision, not only from infrastructure perspective, but so that developers don’t have to repeatedly start out with raw infrastructure and build-in all of their own environments.

This standardized, pre-packaged development environment can also be used across multiple development teams and even across multiple departments. Each may need to do some tweaking for their particular area, but it saves everyone an enormous amount of work.

For future updates, follow @VMwareCloudOps on Twitter and join the conversation using the #CloudOps and #SDDC hashtags.

Service Definition – The Tradeoff Between Standardization and Agility

By Rohan Kalra and Pierre Moncassin

In the client server era, IT demonstrated responsiveness by designing infrastructure to meet the technical requirements of various applications that the business relied on to do work. Developers spec’d systems. Ops built the systems. Devs changes the specs. The back and forth continued until the systems were live in production.

There were attempts to enforce architecture standards that were designed to control the chaos of having every system be a unique work of art, but business needs for whatever typically trumped IT needs for simplicity. If developers for a critical business application demanded some unique middleware configuration, they usually got what they requested.

As a result, most IT organizations have racks full of one-off systems that are unique and often hard to support.  “A museum of past technology decisions” is one way to describe the typical enterprise datacenter landscape.

Cloud changes everything

Cloud computing changes this paradigm. With cloud, developers and users experience the value of fast access to standardized commodity compute resources. By accepting and designing around standard resource configurations, developers no longer need to predict usage levels to set capacity requirements, and no longer have to wait through long procurement cycles.  Similarly, by accepting one-size-fits-all, consumers can get immediate access a wide range of ready to use apps.

The trade-off IT consumers make is essentially one of releasing control over technical assets in order to gain control over business processes. In return for accepting increased standardization (typically at the ‘nuts and bolts’ level, e.g. infrastructure, catalog, OLA’s, charging models), they get unprecedented agility at the business level (“on-demand” IT both in the form of provisioning and scaling and usage levels change).

In the cloud era, IT demonstrates responsiveness by giving developers and users immediate access to standard IT services accessed and then scaled on demand.

As a result, IT success in the cloud era depends, to a large extent, on IT consumers to understand the tradeoff and appreciate the value of standardization.

Start with common service definition

The first step to achieving standardization is getting agreement on a common service definition. This includes getting multiple groups that traditionally have requested and received custom work, to agree on the details of standard services. There is an art in building this consensus, as different consumers with unique requirements need to come together to make this a success.The key is communication and consistency starting for from collection of requirements to delivery of services. (more on this process in a future blog post)

Another critical step is standardizing and centralizing an organization’s service catalog and portal. This allows for a consistent and secure customer experience that provides access across all services regardless of underlying environment – physical, virtual, as well and private and public cloud resources.

Standardization also enables IT to be a true service broker, picking the right environment to meet the needs of each service or workload. A service broker strategy includes policy-based governance, service-based costing, and end-to-end life cycle management across all types of internal and external services.

Today, organizations that understand the need for standardization are the ones transforming themselves to be more responsive with cloud-based operating models. For them, standardization is the driver to both increase business agility, and become more efficient from an OPEX perspective.

Key actions you can take:

1. Acknowledge the problem.

Is this true within your organization?

  • Multiple single points of failure?
  • Specific individual’s supporting legacy applications without documented runbooks or recovery procedures?
  • Continuous fire-fights due to complex architectures leading to business downtime?
  • Inefficient manual procedures?
  • War room like setups to solve problems with limited to no root cause analysis and problem solving measures for the future.

2. Before embarking on the journey, take stock candidly of what is actually being delivered today. Ask probing questions from your current-state services.

  • What services levels are actually being delivered (not just promised ‘on paper’)
  • What services look ‘gold plated’ and could be simplified?
  • What services are never, or very occasionally used?

Once you have a firm baseline, you are ready to start the journey.

3. Understand it’s a journey and it takes time. There is no big bang answer to solving this problem.

  • Start with small wins within your organization’s cloud transformation.
  • Development environments are ideal proving grounds.
  • Initialize the cloud first policy.

4. Create a cloud strategy and focus on building business consensus through business communication and outreach.

For more on this topic, join Khalid Hakim with John Dixon of Greenpages for the May 30th #CloudOpsChat on Reaching Common Ground When Defining Services!

For future updates, follow us on Twitter at @VMwareCloudOps and join the conversation by using the #CloudOps and #SDDC hashtags.

Reaching Common Ground When Defining Services – Join Us For #CloudOpsChat!

An optimized service definition process can make or break the success of hybrid clouds or Software-Defined Data Centers (SDDC). But even if you have tools and processes in place to enable automated provisioning, you still need key stakeholder agreement on the makeup of standard services and resource configurations.

  • Standardized services need to meet the needs of those who request and consume the services.  But they also need to make sense to those in IT responsible for both automation that delivers the services and ongoing support.
  • Standardization helps increase flexibility at the business process level. But rigid service definitions can also inhibit those who both consume and deliver the services.

So how can you meet the needs of multiple groups and find common ground when it comes to defining services?

Find out by joining our next #CloudOpsChat on “Reaching Common Ground When Defining Services” taking place on Thursday, May 30th at 11am PT.

The event will be co-hosted by two CloudOps pros who have helped multiple VMware customers reach common ground:

  • John Dixon, Consulting Architect at GreenPages (see John’s posts on GreenPages Journey to the Cloud blog)
  • Khalid Hakim, Cloud Operations Architect at VMware

During the chat, we will answer the tough questions:

  • What service components should be included in a standard service definition?
  • What components can be flexible for modification around the edges?
  • Are there obvious points of abstraction that help balance standardization and flexibility?
  • Are there recommended approaches to getting multiple groups of users to reach consensus?
  • Are there recommended approaches to balancing the needs of both IT and service consumers?
  • What happens if key stakeholders don’t reach consensus?

Here’s how to participate in #CloudOpsChat:

  • Follow the #CloudOpsChat hashtag (via TweetChatTweetGrid, TweetDeck, or another Twitter client) and watch the real-time stream.
  • On Thursday, May 30th at 11am, @VMwareCloudOps will pose a few questions using the #CloudOpsChat hashtag to get the conversation rolling.
  • Tag your tweets with the #CloudOpsChat hashtag. @reply other participants and react to their questions, comments, thoughts via #CloudOpsChat. Engage with each other!
  • #CloudOpsChat should last about an hour.

In the meantime, feel free to tweet at us at @VMwareCloudOps with any questions you may have. We look forward to seeing you in the stream!

The Lowly Metric Has Its Day in the Sun

By Rich Benoit

Back in the day, I would have killed for a tool like vCOps, an analytics tool that uses dynamic thresholds to make sense of the myriad activity metrics that exist in an IT environment. Without dynamic thresholds that identify normal behavior, admins like myself are forced to use static thresholds that never seemed to work quite right. Static thresholds tended either to be set too low, resulting in false positives, or too high, so that by the time they were tripped, the support desk had already started receiving calls from disgruntled users.

Tried, but Failed

  • One approach I tried in order make sense of the cloud of data coming from multiple monitoring tools was to combine several metrics to get a more holistic view. Combined metrics also rely on static thresholds and are similarly plagued with false positives. But, they introduce the additional problem of having to try and figure out which of the underlying metrics actually caused the alarm to trip.
  • Another approach I tried was using end-user experience monitoring, or end-to-end application monitoring. Instead of trying to estimate the performance of an application by looking at the sum of all of its components, I could instead look at the simulated response time for the typical user and transaction. Another end-to-end monitoring tactic was to employ passive application sniffers that would record the response time of transactions. But with both approaches, I was still dependent on static hard thresholds that were invariably exceeded on a regular basis. For example, it wouldn’t be unusual for an application to exceed its 2-second response time goal during regular periods of peak usage. So I had to know when it was normal to exceed the allowed threshold.  In other words, I had to know when to ignore the alarms.
  • Static thresholds also impacted performance monitoring. Other admins would ask, “Did this just start?” or “Is the performance issue the result of a change in the environment?” The monitoring tools wouldn’t provide the needed data. So we would have to roll up our sleeves and try to figure out what happened. Meanwhile the system would be down or just struggling along. Many times the problem would go away after a certain amount of time or after a reboot, only to resurface another day.

In the end, except for a few cases, we just turned off the monitors and alarms.

A Better Approach

That is why I would have killed for vCOps. vCenter Operations Management Suite is built on an open and extensible platform that works with physical and virtual machines.  It is a single solution works with a variety of hypervisors and fits either on-premise or public cloud environments.

It collects and stores metrics over time and works behind the scenes to establish dynamic thresholds. It employs around 18 different algorithms that compete to best fit any one of the millions of metrics it can track. Some algorithms are based on time intervals and others on mathematical models.

With vCops I can now designate specific metrics as KPIs for additional granularity. For example, the tool would learn that it is normal for response times to be in the 2 to 4 second range on Monday mornings, but if it exceeds the normal range, above or below, I can now have a KPI Smart Alert generated.

Another thing that I can use is the Early Warning Smart Alert that detects change in the environment when too many anomalies occur, such as when too many metrics are outside their normal operating range. I can use the various dashboards and detail screens to view the metrics over time, so that instead of wondering whether the issue is the result of a capacity trend or something changing / breaking, I can look and quickly see, “Oh, there’s the problem. Something happened at 1:15 on system X that caused this service to really slow down.”

Now, after more than 20 years in IT, I can finally start to use the multitude of metrics that have been there just waiting to be leveraged.

To get the most out of monitoring tools consider using vCops range of capabilities, including:

  • The ability to track KPIs within the infrastructure, such as Disk I/O or CPU Ready, or leverage the vSphere UI so that you know if your infrastructure has additional capacity or not.
  • Various KPI Super Metrics within the application stack (e.g. cache hit rate or available memory) that alert you when things are outside of a normal range.
  • The power to see exactly how an environment is performing on a given day, and the ability to isolate which component is the source of the issue.
  • The means to track and report the relative health of not only your components, but your services as well, without having to view everything as up or down at the component level and guess if the application or service is OK.

And it’s all possible because we can now actually use the lowly metric.

For future updates, follow @VMwareCloudOps on Twitter and join the conversation using the #CloudOps and #SDDC hashtags.