Home > Blogs > VMware Operations Transformation Services > Tag Archives: IT organization

Tag Archives: IT organization

Staffing Your Cloud Organization – A Heuristic Model

Approximating staffing ratios in a cloud organization as a logarithmic function of infrastructure metrics.

Pierre Moncassin-cropBy Pierre Moncassin

Customers who want to establish true cloud services based on VMware’s SDDC solution (or any other provider for that matter), realize that in order to fully leverage the technology, they need to adapt their IT organization.

More specifically, they need to setup a dedicated team – a cloud Center of Excellence (COE) to manage and operate their cloud services.

The structure and roles of that team are described in detail in ‘Organizing for the Cloud’.

During practically all Operations Transformation projects, a question frequently asked is: what is the optimum staffing level to setup this cloud organization (FTE a.k.a. Full Time Equivalent)?

The standard consultant answer is of course  ‘it depends’. But in this blog, I will explain in more detail what “it depends” means in this context.

In an earlier blog, I described “10 key factors to estimate staffing ratios to operate platforms with vRealize Automation and vRealize Operations Manager”.

  • Number of lines of business
  • Number of data centers
  • Level staff skill/experience
  • Number of cloud services
  • Workflow complexity
  • Internal process complexity (includes support requirements eg 5 days/5 or 24 hour/7)
  • Number of third party integrations
  • Rate of change
  • Number of VM’s
  • Number of user dashboards/reports

Now these 10 factors, and probably hundreds of other factors will determine the complexity of the tasks that the cloud organization needs to perform and therefore, the staffing level. Clearly there are thousands of possible combinations of these factors. But if I want to see how the FTE count evolves with a single , easy-to-quantify parameter (such as number of virtual machines or any other ‘simple’ infrastructure metric’), we need to make strict assumptions to ‘tie down’ the other factors.

So let’s assume that we are looking at a single organization evolving over time; as time passes the number of virtual machines gradually increases, but so does the number and complexity of the services, as well as the demand for support coverage:

  1. Between 1 and 100 VM’s, the COE is running as a pilot, there are no support requirements, only a small number of services to run.
  2. Between 100 and 1000 VM’s., the COE is running cloud services regionally with some basic service levels.
  3. Over say, 30,000 VM’s, the COE is now running a global operation with 24/7 support requirement and a broad range of services.

Practical observation of a number of real-life examples suggests an evolution broadly similar to the logarithmic curve in figure 1. Now this is still a model that deliberately simplifies and ‘smooths out’ the FTE curve, but there are two practical implications:

  • The staffing levels may rise most steeply at the beginning of the curve. When the organization transitions from a pilot to a fully operating COE, the staffing need levels rise significantly.
  • The FTE curve flattens out then the organization matures and can handle high volumes. Once the COE is operating with a high level of automation with experienced staff, adding workload only requires a marginal increase to the FTE’s count.

In reality of course the complexity – i.e. the demand on FTE – never grows quite linearly.

We would see threshold effects. For example when we reach 300 worksloads, a new 24×7 service may be added to the portfolio, which requires a rapid increase in FTE.

Take-aways:

  • The faster rise in FTE will occur in the early stages of build-up of cloud services; this is ‘normal’ given that we see an increase altogether of the number of services and the service levels and therefore significantly increasing the demands on the cloud organization;
  • Once well established and automated, the FTE level should only increase marginally with rising infrastructure volumes – your organization will have learned to cope with increasing quantities.
  • We need to caveat that although the FTE curve may look broadly logarithmic, threshold effects are inevitable: new demands on service level (eg new compliance requirements, 24×7 etc) can create an ‘uptick’ in FTE without necessarily a prior ‘uptick’ in volumes.

What we have presented here in an intuitive model to understand how increasing volumes impact FTE. You are welcome to share your experience and perhaps refine this heuristic model.

—-
Pierre Moncassin is an operations architect with the VMware Operations Transformation global practice and is based in the UK.

Top 5 Tips for Organizing the Cloud

You’re ready to reap the rewards. Is your organization ready to deliver?

5 Tips for Organizing the CloudThe technical and business advantages of the software-defined cloud era are well understood. But all too often a critical aspect of adopting the cloud model is overlooked: the organizational impact. The fact is the transition to the cloud changes roles, skills, processes and organizational structures. Yet, many IT leaders become so focused on the vision of the cloud—or its technological requirements—that they lose sight of whether their IT staff is properly prepared for the new world.

The resource, Top 5 Tips for Organizing the Cloud, will help you prepare for—and execute—a successful move to the cloud.

Green vs. Grey: Rethinking Your IT Operations

By Neil Mitchell

Neil Mitchell-cropMany of the customers I work with wish, at least in the short term, to provide IT services internally rather than outsource. They are also considering introducing cloud services as a catalyst for transitioning away from traditional IT practices—to establish new practices and processes appropriate for effective and efficient cloud service delivery. They also recognise that this will require a fundamental shift in their approach and that they will need to modify their operating model and organisation structure to support this.

Picture the scenario: an existing IT organisation that has built up over many years; is delivering services and may follow traditional good practice frameworks such as ITIL. However, chances are that the organization is not optimised for the delivery of cloud services—it’s probably siloed, may have developed many poor and labour-intensive practices along the way, and may not be perceived to deliver value. Sound familiar?

An option therefore is to re-invent IT. Create a new greenfield IT organisation with no legacy constraints. Can it be done? Of course, anything is theoretically possible. But is it realistic? What are the challenges?

Let’s make some assumptions for this new greenfield organisation:

  • New staff can be recruited—it won’t be necessary to transition staff from the old organisation.
  • There is the opportunity to establish new, optimised processes across the new organisation.
  • There will be no need to share information or systems between the old and the new organisations.
  • The new organisation can share the current data centres.

I break the challenges around this outwardly simple greenfield goal into two primary areas: 1) organisational, and 2) process and technology. And in my experience, there can be more questions than answers. However, I encourage my clients to address these issues before establishing what may be unachievable strategies.

1. Organisational challenges.

With any programme of change such as will be triggered by the introduction of cloud services, cultural change is the greatest challenge. You can argue that introducing a new greenfield organisation will overcome this—so let’s take a closer look.

Firstly staffing. Your old IT organisation will not disappear overnight. You would be introducing a new parallel or shadow organisation. Do you transition staff across or recruit from scratch?  If you transition staff, legacy practices and behaviours will almost certainly migrate. Equally, if you do not transition staff, good practices may be lost. If you recruit from scratch then there is overall induction to the organisation and associated training to be considered—with inherent hidden costs, not to mention any additional employment obligations that come with recruiting. There may also be HR implications for the existing staff. New staff does not necessarily mean a new culture and better behaviours.

You will also need to determine if you can use contractors, either as an interim to help establish the new cloud organisation or to backfill the existing one. Either is an option, but be aware that contractors may have a different objective from what is necessarily in the best interests of the company. You will need to expend effort to ensure that you obtain well-qualified and professional contractors who would be an asset to your company in establishing the cloud organisation.

Any of these options will create a bigger IT organisation—at least in the short term—for which you must budget.

Secondly organisation structure. Should the new cloud organisation work within the existing organisation? Keeping it within the existing organisation structure runs the risk of undue influence by the “we’ve always done it this way” crowd so may not be truly greenfield. If a separate organisation or department, you will need also to determine whether to replicate management and back-office functions such as HR, Procurement, and IT Finance.

Another consideration is whether the new organisation is to be in a new location or within an existing site.  A new location certainly works from an isolation perspective and will reduce risk of migrating legacy practices, but you may also lose the positive benefits of interaction with the existing staff.

2. Process and technology challenges.  

Let’s start with the new cloud architecture and whether it will be stand-alone. Consider possible components: operating system, middleware, monitoring agents, and even application components. There may be corporate standards and configurations to be followed, which will need to be reviewed to ensure they are fit for purpose in the new environment. And, whether you set up a shadow support organisation or take advantage of existing expertise is yet another consideration.

Security, risk, and compliance concerns must also be addressed early on, as the new IT organisation will likely have to interact with the existing business to ensure the delivered services meet any regulatory or legal requirements. Your new IT organisation will not be thanked for delivering services that brought in unwelcome regulatory investigations!

IT service management brings up numerous considerations including whether to:

  • Create a parallel service desk vs. modify process and procedures for the existing service desk
  • Retain the single number your customer calls today vs. provide access to all new cloud services solely online via a service catalogue
  • Implement new parallel processes and even new systems for event, incident, problem, and change management vs. modify existing ones to account for a cloud-optimized approach

Correct provision of cloud services will introduce a major increase in policy-based automation and standardisation and result in opportunities for you to optimize operations, but not in isolation from the higher-level service management context.

Is a greenfield organisation really a practical option for you? It may be as an aspiration, but if you’re like most of the IT executives I work with, the reality is a little more grey.

Org for Cloud wpSo where to start?

VMware has built some of the largest and most successful public and private clouds in the world, and we thoroughly understand the opportunities and the challenges. My recommendation to you as a starting point—my colleague Kevin Lees, Principal Architect for VMware’s global Operations Transformation Practice, recently updated his white paper, Organizing for the Cloud. The paper looks at the organisational impacts of transformation from multiple perspectives and provides insights and advice about how to prepare for—and execute—your winning transformation strategy.

====
Neil Mitchell is an operations architect with the VMware Operations Transformation global practice and is based in the UK.

4 Ways to Maximize the Value of VMware vRealize Operations Manager

By Rich Benoit

Benoit-cropWhen installing an enterprise IT solution like VMware vRealize Operations Manager (formerly vCenter Operations Manager), supporting the technology implementation with people and process changes is paramount to your organization’s success.

We all have to think about impacts beyond the technology any time we make a change to our systems, but enterprise products require more planning than most. Take, for example, the difference between installing VMware vSphere compared to an enterprise product. The users affected by vSphere generally sit in one organization, the toolset is fairly simple, little to no training is required, and time from installation to extracting value is a matter of days. Extend this thinking to enterprise products and you have many more users and groups affected, a much more complex toolset, training required for most users, and weeks or months from deployment to extracting real value from the product. Breaking it down like this, it’s easy to see the need to address supporting teams and processes to maximize value.

Here’s a recent example from a technology client I worked with that is very typical of customers I talk to. Management felt they were getting very little value from vRealize Operations Manager. Here’s what I learned:

  • Application dashboards in vRealize Operations Manager were not being used (despite extensive custom development).
  • The only team using the tool was virtual infrastructure (very typical).
  • They had not defined roles or processes to enable the technology to be successful. outside of the virtual infrastructure team.
  • There was no training or documentation for ongoing operations.
  • The customer was not enabled to maintain or expand the tool or its content.

My recommendations were as follows, and this goes for anyone implementing vRealize Operations Manager:

  1. Establish ongoing training and documentation for all users.
  2. Establish an analyst role to define, measure and report on processes and effectiveness related to vRealize Operations Manager and to also establish relationships with potential users and process areas of vRealize Operations Manager content.
  3. Establish a developer role to create and modify content based on the analyst’s collected requirements and fully leverage the extensive functionality vRealize Operations Manager provides.
  4. Establish an architecture board to coordinate an overall enterprise management approach, including vRealize Operations Manager.

The key takeaway here: IT transformation isn’t a plug-and-play proposition, and technology alone isn’t enough to make it happen. This applies especially to a potentially enterprise-level tool like vRealize Operations Manager. In order to maximize value and avoid it becoming just another silo-based tool, think about the human and process factors. This way you’ll be well on the way towards true transformational success for your enterprise.

—-
Rich Benoit is an Operations Architect with the VMware Operations Transformation global practice.

Is Your IT Organization Ready to Deliver?

By Kevin Lees

Kevin_cropI recently updated the white paper I wrote a couple of years ago — Organizing for the Cloud — which has been quite popular with our customers. The good news:

  • It’s shorter – condensed to really focus in on the areas that our customers have told us are the greatest help
  • The core concepts and models remain intact and have survived the test of time, and our customers continue to benefit from our best practice recommendations

From my perspective, there is no bad news; at least any I could come up with. IT leaders continue to validate with me that a new organizational approach as well as their people—and their roles and responsibilities—are more important than ever.

While I wrote that the core concepts and models have survived the test of time, that’s not to say this is just a condensed version. I’ve updated a few sections based on new technical capabilities enabled by the SDDC and my experience working directly with customers  including:

  • The organizational impacts of the software-defined data center (SDDC) as the cloud infrastructure – including a couple of new roles
  • How to get started
  • An expanded section on key cross-team collaboration

Cover v2Just to name a few.

Organizational change continues to be top of mind as IT executives implement SDDC as the infrastructure of choice for cloud as well as double down on their use of cloud as the future of IT. No matter what the intended topic of conversation when discussing the operational implications of SDDC and cloud, 9 out of 10 conversations I have with customers quickly turn to organizational implications.

Organizational change is a critical step to success in the new era of cloud. I hope you find this revision as useful as our customers found the original.

=====
Kevin Lees is principal architect for VMware’s global Operations Transformation Practice and is based in Colorado.

Transforming Operations and Perception of the IT Organization

By David Crane

dcrane-cropA recent engagement with a long-established telecommunications firm presented a huge challenge—the solution for which is a great example of how operations transformation can drive technical transformation. The firm’s customer base spans various global regions, each of which presented a different customer experience. The IT organization functioned in extremely siloed environments, having grown organically over 25 years to support an aging, fragmented infrastructure.

A frustrated but motivated CIO laid down the following requirements for the VMware consulting services team, to be met over an aggressive six-month timeline:

  • Reduce operational costs
  • Improve agility
  • Provide more service offerings
  • Help IT become a service broker and eliminate shadow IT
  • Build a flexible architecture to meet the needs of the business
  • Reduce total number of physical data centers
  • Gain more control and compliance of IT infrastructure environments

The internal IT team lacked the expertise and resources required to implement a software-defined data center (SDDC) solution. Their service request process was time-consuming, manual, and inconsistent. Add to that an average provisioning time for a full end-to-end server of eight weeks, and it’s no surprise that internal customers were seeking out external solution providers for their IT needs.

The VMware team set out to remedy all of this with the following solution:

  • Implement a production SDDC platform
  • Make self-service automated provisioning the first available service
  • Assess the customers’ operating processes
  • Introduce an optimized organizational structure
  • Integrate operations transformation and technical implementation
  • Take a phased approach to the project with clearly defined milestones to deliver immediate results
  • Ensure the VMware team team worked closely with internal groups

Transforming the Operating Model
Breaking down the siloed IT organization, and introducing horizontal, cross-departmental communications was the first step to allow the customer to become service-focused.

The team did have the business analyst concept, but the analysts sat outside the IT organization. They didn’t understand IT and weren’t incentivized to do so. As a result, rogue users were going out and doing things themselves, leading to compliance and governance issues.

We introduced the concept of infrastructure operations and tenant operations. These were cross-functional teams that talk to each other—a virtual center of excellence within the IT organization. As part of this organizational change, we brought in new roles, the two most important being the customer relationship manager and the service owner. We brought customer relationship management back into IT, so the person in the role started to understand IT and what they could deliver (and how) against customer requirements.

One of these requirements was the revelation that customers did not really have an interest in availability.  This was not because they didn’t care, but simply because IT over the years has become robust enough that availability is expected.  What their customers really cared about was the speed, and standardization, of the service provisioning lifecycle, as it was this that allowed them to quickly respond to market demands, and support the business objective to be the first to market with new products.

This led to a technical requirement as the IT organization’s customers requested to see this information in a dashboard format, so that proactive monitoring of the provisioning process could take place.

Transforming Infrastructure Operations
The service owners played a key role in saying VMware vRealize Operations only looks at infrastructure—this resulted in a demand to change things within VMware vRealize Automation.

However, the dashboards needed to be delivered through vRealize Operations. To meet the technical requirement, we focused on the self-service provisioning portal and allowed consumers to monitor the status of their ordered services via that portal. To do that, we needed a dashboard in VMware vRealize Operations to monitor the KPIs involved in service provisioning. In order to build the dashboard to monitor provisioning time, we had to create a custom solution using vRealize Automation. The technical solution was necessary to enable the operating framework architecture and organizational model to support it.

Dashboard Solution
We ended up with a provisioned resources dashboard as shown in figure 1 below that lists each virtual machine (VM) and the number of minutes it took to be provisioned. Less than 30 minutes shows green, less than two hours shows yellow, and over two hours is red. It also shows the average, minimum, and maximum times to provision.

Time to Provision

Figure 1:  Provisioned resources dashboard

The dashboard also enabled the customer to use data to feed back into the service life cycle process. For example, they started to understand service demand. Service owners—who were expected to forecast demand for services—could now do so with more accuracy. Now that the team was forecasting capacity demand more accurately, they were able to increase credibility by sharing this information with the infrastructure team. And ultimately they saved money by having a better handle on demand.

The dashboard also allowed IT to develop proactive operational processes.  On several occasions the service owners started to see a degradation in performance of the provisioning process, while the infrastructure monitoring dashboards were still showing a healthy ecosystem.

On further analysis, changes to the underlying infrastructure, whilst keeping in tolerance and SLA for the IT infrastructure teams, were having an accumulative impact further down the chain to the service provisioning process.

The provisioning dashboard and further integration with the customers’ service desk platform and event, incident, and problem management processes allowed the IT infrastructure teams to tune the change management process so that service provisioning would not be affected.

In the end, IT became service-oriented because of the dashboard. Because internal customers could use that tool to see the incredible accuracy with which the IT team was meeting its 30-minutes-or-less goal, it had a huge impact on the way the IT was perceived within business. IT’s credibility skyrocketed, and suddenly it became easier to drive things like the “cloud first” policy within the organization.

======
David Crane is an operations architect with the VMware Operations Transformation global practice and is based in the U.K.

How to Avoid 5 Common Mistakes When Implementing an SDDC Solution

By Jose Alamo

Jose alamo-cropImplementing a software-defined data center (SDDC) is much more than implementing or installing a set of technology — an SDDC solution requires clear changes to the organization vision, policies, processes, operations, and organization readiness. Today’s CIO needs to spend a good amount of time understanding the business needs, the IT organization’s culture, and how to establish the vision and strategy that will guide the organization to make the adjustments required to meet the needs of the business.

The software-defined data center is an open architecture that impacts the way IT operates today. And as such, the IT organization needs to create a plan that will utilize the investments in people, process, and technology already made to deliver both legacy and new applications while meeting vital IT responsibilities. Below is a list of five common mistakes that I’ve come across working with organizations that are implementing SDDC solutions, and my recommendations on how avoid their adverse impacts:

1. Failure to develop the vision and strategy—including the technology, process, and people aspects
Many times organizations implement solutions without setting the right expectation and a clear direction for the program. The CIO must use all the resources available within the IT organization to create a vision and strategy, and in some cases it is necessary to bring in external resources that have experience in the subject. The vision and strategy must align with the business needs, and it should identify the different areas that must be analyzed to ensure a successful adoption of an SDDC solution.

In my experience working with clients, it is imperative that as part of the planning a full assessment is conducted, and it must include the areas of people, process, and technology. A SWOT analysis should also be completed to fully understand the organization’s strengths,  weaknesses, opportunities, and threats. Armed with this insight, the CIO and IT team will be able to express the direction that must be taken to be successful, including the changes required across people, process, and technology.

Failing to complete this step will add complexity and lack of clarity for those who will be responsible for implementing the solution.

2. Limited time spent reviewing and understanding the current policies
There are often many policies within the IT organization that can prevent moving forward with the implementation of SDDC solutions. In such cases, the organization needs to have an in-depth review of the current policies governing the business and IT day-to-day operations. The IT team also needs to ensure it devotes a significant amount of time with the company’s security and compliance team to understand their concerns and what measures need to be taken to make the necessary adjustments to support the implementation of the solutions. For example, the IT organization needs to look at its change policies; some older policies could prevent the deployment of process automation that is key to the SDDC solution. When these issues are identified from the beginning, IT can start the negotiation with the lines of business to either change its policies or create workarounds that will allow the solution to provide the expected value.

Performing these activities at the beginning of the project will allow IT leadership to make smart choices and avoid delays or workarounds when deploying future SDDC solutions.

3. Lack of maturity around the IT organization’s service management processes
The software-defined data center redefines IT infrastructure and enables the IT organization to combine technology and a new way of operating to become more service-oriented and more focused on business value. To support this transformation, mature service management processes need to be established.

After the assessment of current processes, the IT organization will be able to determine which process will require a higher level of maturity, which process will need to be adapted to the SDDC environment, and which processes are missing and will need to be established in order to support the new environment.

Special attention will be required for the following processes:  financial management, demand management, service catalog management, service level management, capacity management, change management, configuration management, event management, request fulfillment, and continuous service improvement.

Ensure ownership is identify for each process, with KPIs and measurable metrics established—and keep the IT team involved as new processes are developed.

4. Managing the new solution as a retrofit within the current environment
Many IT organizations will embrace a new technology and/or solution only to attempt to retrofit it into their current operational model. This is typically a major mistake, especically if the organization is expecting better efficiency, more flexibility, lower cost to operate, transparency, and tighter compliance as potential benefits from an SDDC.

Organizations must assess their current requirements and determine if they will be required for the new solutions. Most processes, roles, audit controls, reports, and policies are in place to support the current/legacy environment, and each must be assessed to determine its purpose and value to the business, and to determine whether it is required for the new solution.

IT leadership should ask themselves: If the new solution is going to be retrofitted into the current operational model, then why do we need a new solution?  What business problems are we going to resolve if we don’t change the way we operate?

My recommendation to my clients is to start lean, minimize the red tape, reduce complex processes, automate as much as possible, clearly identify new roles, implement basic reporting, and establish strict change policies. The IT organization needs to commit to minimize the number of changes to the new solution to ensure only changes that are truly required get implemented.

5. No assessment of the IT organization’s capabilities and no plan to fill the skill set gaps
The most important resource to the IT organization is its people. IT management can implement the greatest technologies, but their organizations will not be successful if their people are not trained and empowered to operate, maintain, and enhance the new solution.

The IT organization needs to first assess current skill sets. Then work with internal resources and/or vendors to determine how the organization needs to evolve in order to achieve its desired state. Once that gap has been identified, the IT management team can develop an enablement plan to begin to bridge the gap. Enablement plans typically include formal “train the trainer” models to cascade knowledge within the organization, as well as shadowing vendors for organizational insight and guidance along with knowledge transfer sessions to develop self-sufficiency. In some cases it may be necessary to bring in external resources to augment the IT team’s expertise.

In conclusion, implementing a software-defined data center solution will require a new approach to implementing processes, technologies, skill sets, and even IT organizational structures. I hope these practical tips on how to avoid common mistakes will help guide your successful SDDC solution implementations.

====
Jose Alamo is a senior transformation consultant with VMware Accelerate Advisory Services and is based in Florida. Follow Jose on Twitter @alamo_jose  or connect on LinkedIn.

10 Factors to Consider When Estimating IT Staff Ratios Needed to Operate a Cloud Platform

By Pierre Moncassin

Pierre Moncassin-cropIn this post, I want to share with you some “rule of thumb” estimates on how many full-time equivalent (FTE) positions an IT organization may need to operate a cloud platform. Note: this is not an exact science, so I wanted to give you the practitioner’s approach. What are the general guidelines? What do I need to take into account?

Readers can learn more specific details around the different roles in the cloud management team in the VMware white paper “Organizing for the Cloud” as a starter. Here I use a generic term of “administrator” or “operator” to broadly describe the technicians/analysts/operators who manage and configure the tools on a daily basis. Here’s my list of factors to consider when estimating IT staff ratios:

  1. Number of lines of business. It stands to reason that the higher the number of distinct business units (lines of business) that are using the cloud, the higher the number and complexities of workflows to support, the more user profiles to manages, reports to produce, and so forth.
  2. Number of data centers. If the toolsets must manage multiple data centers, there will be added complexity in order to manage multiple environments, which often are in different locations.
  3. Level staff skill/experience. The higher the experience of the operators, the larger and more complex the infrastructure they can manage.  In other words, IT should require fewer FTEs to manage the same level of complexity in a cloud infrastructure. (This is a topic that deserves a separate article: “How the IT Organization Learns to Use Cloud Management Tools — and Over Time.”)
  4. Number of services. By this I mean cloud-type services, as in IT-as-a-service or applications. As a starter, determine how many services will be offered in the cloud service catalog.
  5. Workflow complexity. Factor in the internal complexity of the automated workflows. For example, on a scale of 1-5 (5 being most complex), a workflow with multiple approval points might score as 5, whereas a basic workflow as 1.
  6. Internal process complexity. Within IT, the organization with a higher number of mandatory internal process steps (which might all be in place for good reason) will likely need more staff (or it will take their staff longer) to carry out the same tasks as the organization with fewer internal process steps. A higher degree of complexity often develops in highly regulated environments, be it defense or civil administrations, or where an outsourcing provider requires rigid contractual relationships with inflexible approvals. Process and workflow complexity are related but separate considerations (all processes are not automated into workflows).
  7. Number of third-party integrations. The more integrations that need to be built into the automation workflows, the higher the workload for the operators.
  8. Rate of change. Change may be due to business change (mergers, acquisitions, new products, new applications), but also technological change (such as internal transformation programs). These may impact FTE requirements.
  9. Number of virtual machines under management. It may help to group into broad ranges: less than 100, 100 to 1,000, 1,000 to 10,000, and above 10,000. That range will impact FTE requirements.
  10. Number of user dashboards/reports to maintain. This can range from a couple basic reports to dozens of dashboards and complex reports. If the reporting is not sufficiently automated, the “unfortunate” administrators may need to spend a substantial part of their time producing custom reports for various user groups.

For those readers keen on modeling, each factor I’ve provided can be quite easily prorated on a 1-to-5 scale and turned into a formula. Others can be satisfied with applying as a simple rule of thumb.

My approach can be extended to VMware vRealize Automation or vRealize Operations management products, as well as other management tools. Stay tuned for a future article, as I am also at work to break down the roles far more accurately than “administrators.”

Meanwhile, consider the above factors I’ve outlined as basic guidelines. And a call to action for practitioners: Compare my guidelines to your metrics, and send me your feedback!

—-
Pierre Moncassin is an operations architect with the VMware Operations Transformation global practice and is based in the UK.

Leveraging Proactive Analytics to Optimize IT Response

By Rich Benoit

Benoit-cropWhile ushering in the cloud era means a lot of different things to a lot of different people, one thing is for sure: operations can’t stay the same. To leverage the value and power of the cloud, IT organizations need to:

  1. Solve the challenge of too many alerts with dynamic thresholds
  2. Collect the right information
  3. Understand how to best use the new alerts
  4. Improve the use of dynamic thresholds
  5. Ensure the team has the right roles to support the changing environment

These steps can often be addressed by using the functionality within VMware vRealize Operations Manager, as described below.

1) Solve the challenge of too many alerts with dynamic thresholds
In the past when we tried to alert based on the value of a particular metric, we found that it tended to generate too many false positives. Since false positives tend to lead to the alerts being ignored, we raise the value of hard threshold for the alert until we no longer get false positives. The problem is that users are now calling in before the alert actually triggers, defeating the purpose of the alert in the first place. As a result, we tend to monitor very few metrics because of the difficulty in finding a satisfactory result.

However, now we can leverage dynamic thresholds generated by analytics. These dynamic thresholds identify the normal range for a wide range of metrics according to the results of competing algorithms that best try to model the behavior for each metric over time. Some algorithms are based on time such as day of the week, while others are based on mathematical formulas. The result is a range of expected behavior for each metric for a particular time period.

One of the great use cases for dynamic thresholds is that they identify the signature of applications. For example, they can show that the application always runs slow on Monday mornings or during month-end processing. Each metric outside of the normal signature constitutes an anomaly. If enough anomalies occur, an early warning smart alert can be generated within vRealize Operations Manager that indicates that something has changed significantly within the application and someone should investigate to see if there’s a problem.

2) Collect the right information
As we move from more traditional, client-server era environments to cloud era environments, many teams still use monitoring that has been optimized for the previous era (and tends to be siloed and component-based, too).

It’s not enough to just look at what’s happening with a particular domain or what’s going on with up-down indicators. In the cloud era, you need to look at performance that’s more aligned with the business and the user experience, and move away from a view focused on a particular functional silo or resource.

By putting those metrics into a form that an end user can relate to, you can give your audience better visibility and improve their experience. For example, if you were to measure the response time of a particular transaction, when a user calls in and says, “It’s slow today,” you can check the dynamic thresholds generated by the analytics that show the normal behavior for that transaction and time period. If indeed the response times are within the normal range, you can show the user that although the system may seem slow, it’s the expected behavior. If on the other hand the response times are higher than normal, a ticket could be generated for the appropriate support team to investigate. Ideally, the system would have already generated an alert that was being researched if a KPI Smart Alert had been set up within vRealize Operations Manager for that transaction response time.

3) Understand how to best use the new alerts

You may be wondering: Now that I have these great new alerts enabled by dynamic thresholds, how can I best leverage them?  Although they are far more actionable than previous metric-based alerts, the new alerts may still need some form of human interaction to make sure that the proper action is taken. For example, it is often suggested that when a particular cluster in a virtualized environment starts having performance issues that an alert should be generated that would burst its capacity. The problem with this approach is that although performance issues can indicate a capacity issue, they can also indicate a break in the environment.

The idea is to give the user as much info as they need when an alert is generated to make a quick, well-informed decision and then have automations available to quickly and accurately carry out their decision. Over time, automations can include more and more intelligence, but it’s still hard to replace the human touch when it comes to decision making.

4) Improve the use of dynamic thresholds
A lot of monitoring tools are used after an issue materializes. But implementing proactive processes gives you the opportunity to identify or fix an issue before it impacts users. It’s essential that the link to problem management be very strong so processes can be tightly integrated, as shown in figure 1.

event incident problem cycle

Figure 1: Event incident problem cycle

During the Problem Management Root Cause Analysis process, behaviors or metrics are often identified that are leading indicators for imminent impacts to the user experience. As mentioned earlier, vRealize Operations Manager, as the analytics engine, can create both KPI and Early Warning smart alerts, at the infrastructure, application, and end-user level to alert on these behaviors or metrics. By instrumenting these key metrics within the tool you can create actionable alerts in the environment.

5) Ensure the team has the right roles to support the changing environment.
With the new found abilities enabled by an analytics engine like vRealize Operations Manager, the roles and its structure become more critical. As shown in figure 2 below, the analyst role should be there to identify and document the opportunity for improvement, as well as, report on the KPIs that indicate the effectiveness of the alerts already in place. In addition, developers are needed to develop the new alerts and other content within vRealize Operations Manager.

new roles

Figure 2: New roles to support the changing environment

In a small organization, one person may be performing all of these functions, while in a larger organization, an entire team may perform a single role. This structure can be flexible depending on the size of the organization, but these roles are all critical to leveraging the capabilities of vRealize Operations Manager.

By implementing the right metrics, right KPIs, right level of automation, and putting the right team in place, you’ll be primed for success in the cloud era.

—-
Richard Benoit is an Operations Architect with the VMware Operations Transformation global practice.

5 Steps to Shape Your IT Organization for the Software-Defined Data Center

by Tim Jones

TimJones-cropOne aspect of the software-defined data center (SDDC) that is not solved through software and automation is how to support what is being built. The abstraction of the data center into software managed by policy, integrated through automation, and delivered as a service directly to customers requires a realignment of the existing support structure.

The traditional IT organizational model does not support bundling compute, network, storage, and security into easily consumable packages. Each of these components is owned by a separate team with its own charter and with management chains that don’t merge until they reach the CTO. The storage team is required to support the storage needs of the virtualized environment as well as physical servers, the backup storage, and replication of data between sites. The network team has core, distribution, top of rack, and edge switches to support in addition to any routers or firewalls. And someone has to support the storage network whether it is IP, InfiniBand, or Fibre Channel. None of these teams has only the software-defined data center to support. The next logical question asked is: What does an organization look like that can support SDDC?

While there is no simple answer that allows you to fill a specific set of roles with staff possessing skill sets from a checklist, there are many organizational models that can be modified to support your SDDC. In order to modify an organizational model or to build your own model to meet your IT organization’s requirements, certain questions need to be answered. The answers to the following five steps will help shape your new organization model:

  1. Define what your new IT organization will offer.
    Although this sounds elementary, it is necessary to understand what is planned on being offered in order to know what is necessary to provide support. Will infrastructure as a service (IaaS) be the only offering or will database as a service (DBaaS) and platform as a service (PaaS) also be offered? Does support stop at the infrastructure layer, or will operating system, platform, or database support be required? Who will the customer work with to utilize the services or to request and design additional services?
  2. Identify the existing organizational model.
    A thorough understanding of the existing support structure will help identify what support customers will expect based on their current experience and any challenges associated with the model. Are there silos within that negatively impact customers?  What skills currently exist in the organization?  Identifying the existing organization and defining what will be offered by the new organization will help to identify what gaps exist.
  3. Leverage what is already working.
    If there are components of the existing organization that can either be replicated or consumed by the new organization, take advantage of the option. For example, if there is already a functioning group that works with the customers and supports the operating system, then evaluate how to best incorporate them into the new organization. Or if certain support is outsourced, then incorporate that into the new organizational model.
  4. Evaluate beyond the technical.
    The inclusion of service architects, process designers, business analysts, and project managers can be critical to the success of your new organization. These resources could be consumed from existing internal groups such as a central PMO. But overlooking the non-technical organizational requirements can inhibit the ability of the IT organization to deliver on its service roadmap.
  5. Create a new IT organization.
    Don’t accept the status quo with your current organization. If the storage, compute, and virtualization teams all report through separate management chains in the current organization, the new organization should leverage a single management chain for all three teams. Removing silos within the IT organization fosters a collaborative spirit that results in better support and better service offerings for customers.

Although there is no one size fits all organizational model for the software-defined data center, understanding where your IT organization is currently and where it is headed will enable you to create an organizational model capable of supporting the service roadmap.

====
Tim Jones is business transformation architect with VMware Accelerate Advisory Services and is based in California.