There is no denying that two irresistible forces are driving today’s—and tomorrow’s—use of information technology. First, software is key to driving business outcomes, both for staff productivity and communication, and for creating and exploiting competitive advantages. Second, all software needs to run on a cloud of some kind, be it public or private—or, as is increasingly the case, a hybrid cloud of some kind or another.
In fact, the term “multi-cloud” often applies to modern IT infrastructures, because it reflects the increasing presence of multiple hybrid cloud architectures within the scope (if not the confines) of a single organization. This is where hyperconverged infrastructure, or HCI, comes into play.
In fact, HCI provides the means to make multi-cloud operations both efficient and agile. The timing couldn’t be better, as companies are increasingly moving from traditional operations into a hybrid, multi-cloud future. This transition also introduces the need for a new type of operations—one intended for, and aimed specifically at, HCI. Let’s call this new take on the traditional IT management discipline of operations management “HCI Ops.”
HCI Ops Defined
HCI Ops is a foundational discipline that applies to all businesses and organizations. It provides important tools and technologies to help its users deliver necessary performance and meet business needs. It offers insights to help them bring up new data centers or deal with consolidation and modernization projects.
Above all, it seeks to enable ways to cut costs. This is done primarily through managing capital expenditures (CapEx), operational expenditures (OpEx), and licensing fees more efficiently and dynamically, as well as helping organizations optimize use of existing resources.
It does this by deploying software-defined data center (SDDC) technologies and integrating multiple private/public cloud connections. This approach provides the proverbial “single pane of glass” to unify visibility into all the systems involved. In turn, this helps to accelerate business and technical decisions, as well as troubleshooting and problem resolution.
Where does HCI Ops fit in? It uses machine learning (ML) and artificial intelligence (AI) to help organizations plan, optimize and scale their SDDC, extending it to hybrid, multi-cloud deployments. The goal of HCI Ops is to run production operations hands-off and hassle-free.
A unified management platform drives the HCI Ops effort across the board. It draws from expressions of operational and business intent and predictive analytics to deliver continuous performance optimization, proactive planning, efficient capacity management, intelligent remediation and integrated compliance. The key elements for HCI Ops include management and planning for the following operations:
- workloads
- capacity
- monitoring and troubleshooting
Let’s explore each of these operations areas to better understand what they mean, how they work, and what they contribute to HCI Ops.
Workload Planning, Balancing, and Optimization
The goal that drives workload planning is best understood as “continuous performance optimization.” It automatically and continuously balances workloads across clusters based on formal expressions of business and operation intent (priorities, preferences, profit and growth targets, and so on).
Real-time predictive analytics drives this balancing act, which also proactively avoids contention for resources and access. Organizations can choose to optimize workload balancing based on combinations of cost, performance, software license management, or consolidation. HCI Ops continuously verifies workload performance against those expressions of intent, and uses the same data to project future requirements. Workloads may be rebalanced automatically at any time, or organizations can choose to schedule balancing during a scheduled maintenance window.
Built-in operations management and automation tools allow HCI Ops to make initial workload placements. They also handle ongoing placement regimes, to make best use of resources and to comply with expressions of business intent.
Placement zone across hosts may be made irrespective of cluster boundaries. Thus, workload placement and balancing can be tuned for software license enforcement, service tiers, or other specific workload tags or designations. This works equally for scheduling of distributed resources using VMware DRS (Distributed Resource Scheduler).
In the same way, vSAN clusters are subject to workload balancing that incorporates management for resync, slack space, and storage policies. This also supports combining predictive analytics from vRealize Operations with DRS, so organizations can analyze and predict future demand. When contention looms, it lets them move workload proactively to avoid such issues rather than having to remediate them after the fact.
Managing and Planning Capacity
HCI Ops supports real-time capacity analytics to let organizations optimize utilization, achieve cost savings, and attain proper consolidation. At the same time, it promotes proactive capacity planning and procurement regimes.
This aspect of HCI Ops is based on formal capacity planning tools that model capacity using flexible, “what-if” scenarios. Such scenarios can straddle multiple clouds, and be stacked together to illuminate overall impact.
Modeling capabilities include best fits for new workloads, the impact of removing workloads, adding or decommissioning hardware, assessing the impact of adding HCI capacity to a vSAN cluster, and public cloud migration planning. Organizations can even run side-by-side cost comparisons of multiple deployment strategies, including private cloud, VMware cloud on AWS, or third-party deployments on AWS, Microsoft Azure, Google, IBM, VMware VCCP partners, or custom clouds.
Using HCI Ops, organizations can also undertake reclamation and right-sizing tasks. The former supports reclaiming of overprovisioned or orphaned virtual hard disks (VMDKs), or idle capacity. The latter enables right-sizing/resizing of virtual machines within the constraints set by expressions of business and operations intent. Predictive analytics make these exercises proactive, based on alerts driven by capacity usage, demand, and current resource allocations.
The tool can also make actionable recommendations to help drive reclamation, procurement, and cloud migration plans. In the same vein, HCI Ops can also combine capacity analytics and cost information. This lets organizations track how operational efficiency and capacity management drives cost efficiency. Metrics illuminate TCO, fine-grained savings opportunities, and let users define and tune cost drivers. These may then be used to drive capacity and cost optimizations.
Monitoring and Troubleshooting HCI Operations
Because HCI Ops offers a single, coherent view of operations and associated activities and efforts, it also offers global and actionable insights that correlate metrics and logs across all applications and the entire infrastructure in use. IT operations can be centralized, based on native SDDS integrations, and federated views across clouds and systems.
This comes from a highly scalable and extensible platform that adapts to match constituent components and real-time conditions, performance, and utilization. Keys to this capability include a holistic, unified IT operations view (“Unified Observability”) into applications and infrastructure health.
Customizable dashboards start from actionable out-of-the box personae predefined to fit specific job roles and workflows. These not only support robust views of system health and performance, they also facilitate quick and accurate troubleshooting. Admins can correlate metrics with objects to create “super metrics.” They can also integrate directly with common ticketing systems (e.g., Service Now) and automate remediation. Dashboards, reports, and views are all customizable, ready to support unique workflows across infrastructure, applications, and operations teams.
HCI Ops supports 360 troubleshooting, which employs metrics and logs side-by-side in actual runtime contexts. If integrated, vRealize Operations and vRealize Log Insight can assembled structured data (metrics and KPIs) with unstructured data (log files) to speed root cause analysis.
Organizations can operationalize and scale VMware SDDS components (vCenter, vSAN, VMware Cloud Foundation, and more) to gain insight into SDDC health, performance, and troubleshooting capabilities. vCenter can also global operations with overviews of vSphere and vSAN environments, including KPIs, critical alerts, capacity overview, plus operational insights and recommendations. Admins can drill down into dashboards for full stack troubleshooting and capacity management.
HCI Ops also makes use of vRealize Operations ability to discover VMs and apps automatically, and to deploy agents with full lifecycle management capabilities. This is what brings native-level management capability into the HCI Ops picture for common packaged applications, multiple host and virtual OSes, and more.
This also enables visualization of relationships between applications and infrastructure elements, for full-stack views in real time that can speed root cause analysis for troubleshooting of performance and availability issues. The overall environment is both open and extensible, so it can accommodate high complexity environments that may span multiple data centers. Domain-specific Management Packs are available from VMware, as well as from third-party hardware and application vendors.
Making the Most of Hybrid, Multi-Cloud HCI
HCI Ops provides the tools and technologies to unify management across platforms, tools, and geographies that can include a mix of private and public clouds and multiple data centers. Essentially, it creates a “self-driving” model for modern IT operations that organizations can employ to reduce costs, increase operational efficiencies, and ensure that operations match business goals and intents.
Give VMware vSAN and vRealize Operations a try to start HCI Ops at your company at the VMware vRealize Operations homepage.