Home > Blogs > VMware Operations Transformation Services > Monthly Archives: November 2014

Monthly Archives: November 2014

It’s Time for IT to Come Out of the Shadows

Chances are shadow IT is happening right now at your company. No longer content waiting for their companies’ IT help, today’s employees are taking action into their own hands by finding and using their own technology to solve work challenges as they arise—a trend that likely isn’t fading into the shadows anytime soon.

Print

10 Factors to Consider When Estimating IT Staff Ratios Needed to Operate a Cloud Platform

By Pierre Moncassin

Pierre Moncassin-cropIn this post, I want to share with you some “rule of thumb” estimates on how many full-time equivalent (FTE) positions an IT organization may need to operate a cloud platform. Note: this is not an exact science, so I wanted to give you the practitioner’s approach. What are the general guidelines? What do I need to take into account?

Readers can learn more specific details around the different roles in the cloud management team in the VMware white paper “Organizing for the Cloud” as a starter. Here I use a generic term of “administrator” or “operator” to broadly describe the technicians/analysts/operators who manage and configure the tools on a daily basis. Here’s my list of factors to consider when estimating IT staff ratios:

  1. Number of lines of business. It stands to reason that the higher the number of distinct business units (lines of business) that are using the cloud, the higher the number and complexities of workflows to support, the more user profiles to manages, reports to produce, and so forth.
  2. Number of data centers. If the toolsets must manage multiple data centers, there will be added complexity in order to manage multiple environments, which often are in different locations.
  3. Level staff skill/experience. The higher the experience of the operators, the larger and more complex the infrastructure they can manage.  In other words, IT should require fewer FTEs to manage the same level of complexity in a cloud infrastructure. (This is a topic that deserves a separate article: “How the IT Organization Learns to Use Cloud Management Tools — and Over Time.”)
  4. Number of services. By this I mean cloud-type services, as in IT-as-a-service or applications. As a starter, determine how many services will be offered in the cloud service catalog.
  5. Workflow complexity. Factor in the internal complexity of the automated workflows. For example, on a scale of 1-5 (5 being most complex), a workflow with multiple approval points might score as 5, whereas a basic workflow as 1.
  6. Internal process complexity. Within IT, the organization with a higher number of mandatory internal process steps (which might all be in place for good reason) will likely need more staff (or it will take their staff longer) to carry out the same tasks as the organization with fewer internal process steps. A higher degree of complexity often develops in highly regulated environments, be it defense or civil administrations, or where an outsourcing provider requires rigid contractual relationships with inflexible approvals. Process and workflow complexity are related but separate considerations (all processes are not automated into workflows).
  7. Number of third-party integrations. The more integrations that need to be built into the automation workflows, the higher the workload for the operators.
  8. Rate of change. Change may be due to business change (mergers, acquisitions, new products, new applications), but also technological change (such as internal transformation programs). These may impact FTE requirements.
  9. Number of virtual machines under management. It may help to group into broad ranges: less than 100, 100 to 1,000, 1,000 to 10,000, and above 10,000. That range will impact FTE requirements.
  10. Number of user dashboards/reports to maintain. This can range from a couple basic reports to dozens of dashboards and complex reports. If the reporting is not sufficiently automated, the “unfortunate” administrators may need to spend a substantial part of their time producing custom reports for various user groups.

For those readers keen on modeling, each factor I’ve provided can be quite easily prorated on a 1-to-5 scale and turned into a formula. Others can be satisfied with applying as a simple rule of thumb.

My approach can be extended to VMware vRealize Automation or vRealize Operations management products, as well as other management tools. Stay tuned for a future article, as I am also at work to break down the roles far more accurately than “administrators.”

Meanwhile, consider the above factors I’ve outlined as basic guidelines. And a call to action for practitioners: Compare my guidelines to your metrics, and send me your feedback!

—-
Pierre Moncassin is an operations architect with the VMware Operations Transformation global practice and is based in the UK.

Transforming Operations to Optimize DevOps

By Ahmed Al-Buheissi

Ahmed_croppedDevOps. It’s the latest buzzword in IT and, as usual, the industry is either skeptical or confused as to its meaning. In simple terms, DevOps is a concept that allows IT organizations to develop and release software rapidly. By acknowledging the pressure the Development and Operations teams within IT place on each other, the DevOps approach enables the Development and Operations teams to work closely together. IT organizations put policies for shared and delegated responsibilities in place, with an emphasis on communication, collaboration, and integration.

Developers have no problem writing code and pushing it out, however their demand for infrastructure causes conflict with the Operations team. Traditionally it is the Operations team that release code to the various environments including Development, Test, UAT, and Production. As developers want to continuously push functionality through the various environments, it is only natural that Operations gets inundated with requests for more infrastructure. When you add Quality Assurance teams in the mix, efficiency is negatively impacted.

Why the rush to release code?
Rapid application development is requisite. The face of IT is changing very quickly and will continue to change even faster. Businesses need to innovate fast, and introduce products and services into the market to beat the competition and meet the demands of their customers.

Here are four reasons rapid application development and release is fundamental:

  1. This is the social media age. Bad code and bugs can no longer be ignored and scheduled for future major releases; when defects are found, word will spread fast through Twitter and blogs.
  2. Mobile applications are changing the way we work and require a different kind of design—one that fits on a smaller screen and is intuitive. If a user doesn’t like one application, they’ll download the next.
  3. Much of the software developed today is modular and highly dependent on readily-available modules and packages. When an issue is discovered with a particular module, word spreads fast among user communities, and solutions need to be developed immediately.
  4. Last and most important, this is the cloud era. The very existence of the Operations team is at stake, because if it cannot provide infrastructure when Development needs it, developers will opt to use a publicly available cloud service. It is that easy.

So what is DevOps again?
DevOps is not a “something” that can be purchased — it’s an approach that requires new ways of working as an IT organization. As an IT leader, you will need to “operationalize” your Development team and bring them closer to your Operations team. As an example, your developers will need the capability to provision infrastructure based on new operations policies. DevOps also means you will need to move some of your development functionalities to the Operations team. For example, the Operations team will need to start writing workflows and associated scripts/code that will be used to automate the deployment process for the development team.

While there are adequate tools that will facilitate the journey to DevOps, DevOps is more about processes and people.

How to implement DevOps
The IT organization needs to undergo both people and process changes to implement DevOps — and it cannot happen all at once — the change needs to be gradual. It is also very difficult to measure “DevOps maturity.” As an IT leader, you will know it when your organization becomes DevOps capable — it happens when your developers have the necessary tools to release software at the speed of business, and your Operations team is focused on innovation rather than being reactive to infrastructure deployment requirements.

Also, your test environment will evolve to a “continuous integration” environment, where developers can deploy their code and have it tested in an automated and continuous process.

I make the following recommendations to my clients for process, people, and tools required for a DevOps approach:

Process
The diagram below illustrates a process for DevOps, in which the Operations team develops automated deployment workflows, and the Development team uses the workflows to deploy to the Test and UAT environments. The final deployment to production is carried out by the Operations team; in fact Operations should continue to be the only team with direct access to production infrastructure.

devops flow

Service Release Process – Service Access Validation

However, it is critical that Development have access to monitoring tools in production to allow them to monitor applications. These monitoring tools may allow tracking of application performance and its impact on underlying infrastructure resources, network response, and server/application log files. This will allow your developers to monitor the performance of their applications, as well as diagnose issues, without having to consume Operations resources.

Finally, it is assumed that the DevOps tools and workflows will be used for all deployments, including production. This means that the Development and Operations teams must use the same tools to deploy to all environments to ensure consistency and continuity as well as “rehearse” the production release.

People

The following roles are the main players in facilitating a DevOps approach:

  • Operations: The DevOps process starts with the Operations team. Their first responsibility is to develop workflows that will automate the deployment of a complete application environment. In order to develop these workflows, Operations is obliged to be part of the development cycle earlier and will therefore have to become closer to Development in order to understand their infrastructure requirements.
  • Development: The Development team will use their development environment to determine the infrastructure required for the application; for example database version, web server type, and application monitoring requirements. This information will assist the Operations team in determining the capacity required and in developing the deployment workflows. It will help with implementing the custom dashboards and metrics reporting capabilities Development needs to monitor their applications. The Development team will be able to develop and deploy to the “continuous integration” and UAT environments without having to utilize Operations resources. They can “rip and replace” applications to these environments as many times as needed by QA and end-users in order to be production-ready.
  • Quality Assurance (QA):  Due to the high quality of automated test scripts used for testing in such an environment, the QA team can play a lesser role in a DevOps environment by randomly testing applications. QA will also need to test and verify the deployment workflows to ensure the infrastructure configuration used is as per the design.
  • End Users: End-user testing can be reduced in a DevOps environment, by only randomly testing applications. However once DevOps is in place, end users should notice a vast improvement in the quality and speed of the applications produced.

Tools
VMware vRealizeTM Code StreamTM  targets IT organizations that are transforming to DevOps to accelerate application released for business agility. Some of the features it offers include:

  • Automation and governance of the entire application release process
  • A dashboard for end-to-end visibility of the release process across Development and Operations organizations
  • Artifact management and tracking

For IT leaders, vRealize Code Stream can help transform the IT organization through a DevOps approach. The “continuous integration” cycle is a completely automated package that will deploy, validate, and test applications being developed.

DevOps can also benefit greatly from using platform-as-a-service (PaaS) providers. By developing and releasing software on PaaS, the consistency is guaranteed as the platform layer (as well as lower layers) are always consistent. Pivotal CF, for example, allows users and DevOps to publish and manage applications running on the Cloud Foundry platform across distributed infrastructure.

Conclusion
Although DevOps is a relatively new concept, it’s really just the next step after agile software development methods. As the workforce becomes more mobile, and social media brings customers and users closer, it’s necessary for IT organizations to be able to quickly release applications and adapt to changing market dynamics. (Learn how the VMware IT DevOps teams are using the cloud to automate dev test provisioning and streamline application development in the short video below.)

Many organizations have tackled the issues associated with running internal development teams by outsourcing software development. I now see the reverse happening, as organizations want to reach the market more quickly and have started to build internal development teams again.

For the majority of my clients, it’s not a matter of “if” but “how quickly” will they introduce DevOps. By adopting DevOps principles, their development teams will be able to efficiently release features as demanded by the business, at the speed of business.

====
Ahmed Al-Buheissi is an operations technical architect with the VMware Operations Transformation global practice and is based in Melbourne, Australia.

 

Leveraging Proactive Analytics to Optimize IT Response

By Rich Benoit

Benoit-cropWhile ushering in the cloud era means a lot of different things to a lot of different people, one thing is for sure: operations can’t stay the same. To leverage the value and power of the cloud, IT organizations need to:

  1. Solve the challenge of too many alerts with dynamic thresholds
  2. Collect the right information
  3. Understand how to best use the new alerts
  4. Improve the use of dynamic thresholds
  5. Ensure the team has the right roles to support the changing environment

These steps can often be addressed by using the functionality within VMware vRealize Operations Manager, as described below.

1) Solve the challenge of too many alerts with dynamic thresholds
In the past when we tried to alert based on the value of a particular metric, we found that it tended to generate too many false positives. Since false positives tend to lead to the alerts being ignored, we raise the value of hard threshold for the alert until we no longer get false positives. The problem is that users are now calling in before the alert actually triggers, defeating the purpose of the alert in the first place. As a result, we tend to monitor very few metrics because of the difficulty in finding a satisfactory result.

However, now we can leverage dynamic thresholds generated by analytics. These dynamic thresholds identify the normal range for a wide range of metrics according to the results of competing algorithms that best try to model the behavior for each metric over time. Some algorithms are based on time such as day of the week, while others are based on mathematical formulas. The result is a range of expected behavior for each metric for a particular time period.

One of the great use cases for dynamic thresholds is that they identify the signature of applications. For example, they can show that the application always runs slow on Monday mornings or during month-end processing. Each metric outside of the normal signature constitutes an anomaly. If enough anomalies occur, an early warning smart alert can be generated within vRealize Operations Manager that indicates that something has changed significantly within the application and someone should investigate to see if there’s a problem.

2) Collect the right information
As we move from more traditional, client-server era environments to cloud era environments, many teams still use monitoring that has been optimized for the previous era (and tends to be siloed and component-based, too).

It’s not enough to just look at what’s happening with a particular domain or what’s going on with up-down indicators. In the cloud era, you need to look at performance that’s more aligned with the business and the user experience, and move away from a view focused on a particular functional silo or resource.

By putting those metrics into a form that an end user can relate to, you can give your audience better visibility and improve their experience. For example, if you were to measure the response time of a particular transaction, when a user calls in and says, “It’s slow today,” you can check the dynamic thresholds generated by the analytics that show the normal behavior for that transaction and time period. If indeed the response times are within the normal range, you can show the user that although the system may seem slow, it’s the expected behavior. If on the other hand the response times are higher than normal, a ticket could be generated for the appropriate support team to investigate. Ideally, the system would have already generated an alert that was being researched if a KPI Smart Alert had been set up within vRealize Operations Manager for that transaction response time.

3) Understand how to best use the new alerts

You may be wondering: Now that I have these great new alerts enabled by dynamic thresholds, how can I best leverage them?  Although they are far more actionable than previous metric-based alerts, the new alerts may still need some form of human interaction to make sure that the proper action is taken. For example, it is often suggested that when a particular cluster in a virtualized environment starts having performance issues that an alert should be generated that would burst its capacity. The problem with this approach is that although performance issues can indicate a capacity issue, they can also indicate a break in the environment.

The idea is to give the user as much info as they need when an alert is generated to make a quick, well-informed decision and then have automations available to quickly and accurately carry out their decision. Over time, automations can include more and more intelligence, but it’s still hard to replace the human touch when it comes to decision making.

4) Improve the use of dynamic thresholds
A lot of monitoring tools are used after an issue materializes. But implementing proactive processes gives you the opportunity to identify or fix an issue before it impacts users. It’s essential that the link to problem management be very strong so processes can be tightly integrated, as shown in figure 1.

event incident problem cycle

Figure 1: Event incident problem cycle

During the Problem Management Root Cause Analysis process, behaviors or metrics are often identified that are leading indicators for imminent impacts to the user experience. As mentioned earlier, vRealize Operations Manager, as the analytics engine, can create both KPI and Early Warning smart alerts, at the infrastructure, application, and end-user level to alert on these behaviors or metrics. By instrumenting these key metrics within the tool you can create actionable alerts in the environment.

5) Ensure the team has the right roles to support the changing environment.
With the new found abilities enabled by an analytics engine like vRealize Operations Manager, the roles and its structure become more critical. As shown in figure 2 below, the analyst role should be there to identify and document the opportunity for improvement, as well as, report on the KPIs that indicate the effectiveness of the alerts already in place. In addition, developers are needed to develop the new alerts and other content within vRealize Operations Manager.

new roles

Figure 2: New roles to support the changing environment

In a small organization, one person may be performing all of these functions, while in a larger organization, an entire team may perform a single role. This structure can be flexible depending on the size of the organization, but these roles are all critical to leveraging the capabilities of vRealize Operations Manager.

By implementing the right metrics, right KPIs, right level of automation, and putting the right team in place, you’ll be primed for success in the cloud era.

—-
Richard Benoit is an Operations Architect with the VMware Operations Transformation global practice.