There’s been a fair amount of digital ink spilled lately over the topic of “data locality”. If you missed the discussion, you can read a detailed analysis by DeepStorage.net in their published report. But to save you some time, here’s the quick recap:
Data Locality (DL) in the context of Hyper-Converged Infrastructure (HCI) refers to ensuring that a copy of the data (storage) required by a virtual machine (VM) always resides on the same physical server as the VM itself. The basic idea is that you can improve application performance by keeping the data on the same server as the workload, eliminating the need for network-based data access. While this seems like a good idea on the surface, in today’s fast evolving, agile IT infrastructure built on high-speed (especially relative to storage) networks, data locality has its limits.
When Does Data Locality Make Sense?
It is certainly true that some level of general data locality makes sense, especially where slower WAN or internet links are involved. For example, data locality makes perfect sense for an active/active stretched cluster between data centers in different cities or for the caching tier of a storage system.
Unfortunately, there are a number of gotchas when it comes to broader uses of data locality, that is, for persistent storage.
- Data Locality only helps read operations. Writes are generally not acknowledged back to the application until data is written to the mirroring node, which means it must traverse the network anyways.
- RAID-based systems depend on network access. If the HCI system uses erasure coding to stripe data across nodes (aka RAID), then network access is required to re-assemble the data for read access, making Data Locality less impactful.
- Network latency continues to decrease. The switched networks used today are much faster and less resource constrained compared to what was prevalent just two years ago. Today, 10G networks have become the standard and very rapidly moving towards 25G and 40G networks, which implies that the additional latency from network-based storage reads will usually be negligible.
Managing Data in a Rapidly Evolving IT Landscape
Applications and workloads are becoming more fluid with respect to scale and location. Data is no longer pinned to specific physical infrastructure, making it harder to know where data resides as new distributed architectures like HCI, Edge, and Hybrid Cloud grow in popularity. Organizations are moving to hybrid cloud options and transient cloud-native, container-based applications. vMotion, DRS, IaaS and cloud bursting are all mainstream technologies.
The fluid nature of data and applications means it is no longer desirable for storage systems to pin down an application to a specific spot—in fact it is now limiting to do so. When a VM or data moves, the data locality performance benefits are lost, making application performance inconsistent and unpredictable. In theory, you could try to implement a solution that has the data physically chase the VM, or the other way around, but this is far from optimal, complex and utilizes resources poorly.
At VMware, we consciously architected vSAN with today’s world of data and app mobility in mind. vSAN is built to deliver “Data Ubiquity”, that is, consistent access to data independent of physical constraints. Customers want their data to be highly available, and to be delivered consistently where it is needed with zero operational overhead. IT administrators want the flexibility to shift applications around, rebalance infrastructure, and deliver dynamic, on-demand solutions without impacting application performance or access. Their focus is on flexibility, agility and availability, while maximizing performance such that the physical location of data is not a constraint.
Security is also top of the mind for customers. Data Locality is sometimes mentioned as a way to improve security, because “you know where the data is”. Again, today’s reality is that applications move, and the data is also replicated for backup and disaster recovery, so in fact locating all the places where the data may reside is a much broader challenge. A more elegant solution for data security is data-at-rest encryption, not data locality.
Delivering Data Ubiquity through Modern HCI
Today’s businesses need to be agile to succeed, and they are looking for IT solutions that allow them to keep up with the rapid pace of innovation. They are looking for ways to break down silos and consolidate workloads on a single platform that delivers the performance, scalability and simplicity needed. VMware vSAN delivers the Data Ubiquity that is very well suited for mixed workloads with a common view of data across clouds (edge to core to public cloud), across applications (traditional to cloud native applications) and across fault domains (cluster to stretched cluster to availability zones).