Hyperconverged Infrastructure vSAN

The Future of Storage and Data Management: A Long-Term Perspective

Over the last few years, a gradual but drastic shift is taking place in the storage industry. Storage products are moving away from traditional “big iron” appliances and Storage Area Networks (SAN). Storage is increasingly seen as an inherent part of the IT infrastructure of an organization, an asset that is managed by teams of generalist IT professionals and not dedicated teams of storage specialists. The rapid emergence of a new generation of storage technologies, including NVMe and new storage physics such as Persistent Memory (PMEM for short), favors the use of commodity hardware components combined with pure software-based storage platforms.

Of course, public clouds have pushed the envelope further; why even own storage assets, if one can consume storage as a service. The cloud model and economics may not work always, but even when an IT organization own their storage assets, public clouds still set the bar for operational efficiencies and scale. Ultimately, storage consumers care about having access to and utilizing their data in the most cost-effective way, not the esoteric properties of physical storage.

We see this industry shift happening along three dimensions:

 

1.Operational Model

Gone are the days of managing storage as an island (OK, perhaps an archipelago) of specialized IT assets with dedicated teams of specialists and costly vendor services. Cloud providers do not buy and operate disk arrays. Traditional enterprise storage would not scale either economically or operationally. To approximate the operational efficiencies of clouds, even customers who own and run their own infrastructure, strive for management models which allow them to take advantage of the cost benefits and fast evolution of commodity hardware. They look for software platforms and tools to manage their entire IT infrastructure in a scalable, efficient way that eliminates IT silos.

This is really the value that a Hyper-Converged Infrastructure (HCI) management model brings to the table – operational simplicity and scalability. HCI is not about the storage platform (even though software-defined storage like VMware vSAN is an enabler). It is not necessarily about running the storage stack and workloads on the same hosts. A dense storage-only cluster serving many compute-only hosts is still HCI, if all those hosts are managed through a single software stack.

The benefits of operational simplicity apply also to hybrid clouds like VMware Cloud on AWS. IT organizations have a single operational model with the abstractions, workflows and tools it entails, no matter whether one manages their own physical IT infrastructure or consumes infrastructure as a service.

VMware is uniquely positioned to deliver on this model based on our industry leading hypervisor (vSphere), network virtualization platform (NSX) and software storage platform (vSAN).

vSAN is a cornerstone of VMware’s hybrid cloud strategy. Unlike public cloud block storage, vSAN offers an enterprise-grade shared storage abstraction that is required for key vSphere features including HA, vMotion, DRS and FT. It enables vSphere workloads to run unmodified on VMware Cloud on AWS as well as other public clouds via VMware’s Cloud Provider Program. VMware customers have the freedom to consume infrastructure in the most cost-effective way according to their needs, while relying on a single operational model and without requiring application re-platforming.

 

2.Data Lifecycle Management

Customers see data protection as a natural use case for hybrid clouds given the compelling economics of public cloud archival storage and the benefits of pay-as-you-go for DR. VMware is already offering DR as a Service for VMware Cloud on AWS and for VMware’s Cloud Provider Program participants.

We anticipate an increasing demand for data protection services that are an inherent part of the storage platform behind hybrid clouds and the HCI operational model they utilize. Customers demand comprehensive services that cover a range of Backup, Disaster Recovery and Data Archival solutions, under (again) a comprehensive and intuitive operational model. Our view is that point solution products found in traditional IT environments do not scale well in hybrid clouds.

We expect software-defined storage vendors, including main HCI vendors, to invest heavily in technologies for data lifecycle management. Features such as snapshot management and data transport engines are already included with most products. Going forward, those features will have to cover a broader set of application abstractions beyond VMs and block volumes. Data lifecycle management will become an inherent part of the HCI model, where policies will be used to cope with scale and complexity.

Additionally, archived data is becoming a valuable asset for all customers. There is an emergence of use cases that involve running a number of analytics workloads on archived data. Initially, we anticipate opportunities to perform data forensics and offer data governance services. Further out, we expect software-defined storage vendors will start offering tools for generic data analysis. Both use cases will utilize Machine Learning (ML) and Artificial Intelligence (AI) technologies including those developed within VMware. Ultimately, we envision an ecosystem developing in this space, where specialized vendors will plug their analytics tools into generic storage and management platforms.

 

3.Application-Centric Storage and Data Services

Today, hypervisors and typical storage platforms target traditional enterprise applications such as relational databases. However, the applications world is changing rapidly with the emergence of technologies such as containers and methodologies such as microservices. These modern, cloud-native applications are designed with public cloud services in mind, not traditional IT infrastructure found on-premises. Distribution, scaling and fault tolerance are features often implemented by the application, which typically assumes a shared-nothing storage model (local disks).

On the other hand, customers still need enterprise-grade, production quality persistent state for cloud-native applications including support for data integrity and security, storage management and data protection.

We are working to make vSAN an ideal platform for serving both traditional enterprise workloads and cloud-native applications. For example, we are developing policies that enforce the affinity of an application instance on the host where its data set resides.

The key value proposition of HCI with vSphere and vSAN is that users can consolidate both traditional applications and new generation, shared-nothing applications on the same cluster. Via policies, each application utilizes only the features of the platform that it requires–nothing more or less.

 

Secondly, we are supporting interfaces and abstractions that cater to the needs of cloud-native apps and a DevOps model. Project Hatchway offers volume drivers for the most common orchestration tools for containerized applications:

  • vSphere Cloud Provider (VCP) is natively built into Kubernetes. By default, it is included in Pivotal Containers Service (PKS).
  • vSphere Docker Volume Service for applications orchestrated by Docker Swarm.

Both offerings use the same storage policy-based management (SPBM) model we use for VMs and virtual disks. Via policies, one can use enterprise features such as checksums, space efficiency, encryption, and in the future, Data Protection and Disaster Recovery. All that at the granularity of container volumes.

The new generation of applications consume data utilizing abstractions that are higher level than block storage. File services and object storage are used for storing and managing unstructured data. Furthermore, many modern applications are designed around higher-level data abstractions such as key-value stores, databases, messages buses and log services. Such abstractions have profound benefits in the rapid development and scalable deployment of applications. When they are offered as a service, things are straightforward. However, that implies customers being bound to certain cloud providers’ services. Also, when IT organizations need to deploy and manage such services in-house, they are faced with a multitude of operational challenges. Operations like deployment, configuration and monitoring are specific to the service. Conceptually simple actions such as storage extension or replacement of faulty devices turn into complex manual workflows. Each key-value store and each database have their own data protection and disaster recovery mechanism (if any at all). Soon, the operating cost of such services becomes prohibitively expensive for the typical organization that does not have the expertise and resources of hyperscalers.

We anticipate the emergence of a new generation of application-centric control planes that will implement a canonical set of operational workflows for data services. An example of things to come is Amazon’s Relational Database Service (RDS). It has enabled thousands of organizations to consume database as a service, while allowing AWS to manage thousands of database instances through an efficient unified operational model. Enterprises will aim for similar operational efficiencies for data services they offer in-house or when their IT infrastructure spans multiple clouds.

On one hand, the control plane will orchestrate data service deployment and setup. It will also implement resource provisioning, configuration workflows (e.g., storage expansion), and software lifecycle management. A combination of virtualization and container orchestration will be used to implement these workflows.

In addition, the control plane will include generic features for data protection and data copy management across cloud boundaries: snapshots, clones and data mobility. We expect that some form of application-level integration may be required to implement efficient data protection (e.g. capture an application consistent copy of a database log). Nevertheless, having a unified set of abstractions and workflows which are independent of the data service(s) deployed and decoupled from the physical storage infrastructure will result in big operational benefits. Enterprises will be able to manage their data and enforce business continuity policies despite using a multitude of storage platforms and data services across clouds.

 

In summary, the industry is pivoting from storage products for the data center to holistic data management solutions for cloud environments. At VMware, we are building on the strengths of our HCI offerings, which we are expanding into data protection and data services across private and public clouds. Our product roadmap is rapidly evolving to cover both traditional enterprise use cases and the requirements of cloud-native apps and the operational model emerging with them (DevOps).