Customers Hyperconverged Infrastructure Software-Defined Storage Strategy vSAN

Evolution of HCI Part 2: Future of All Flash Virtual SAN

Overview

In his last blog post, Vijay Ramachandran described Virtual SAN’s biggest and most awaited edition till date – Virtual SAN 6.2.   He also discussed the rationale behind why we invested in developing space efficiency features on All Flash only.  Coincidentally, there was also a very good blog post from Howard Marks where he discussed the economics of All Flash and why he believes that All Flash is the future of storage.

All Flash

In this blog, I’d like to build on top of those foundations for All Flash by providing insight into some of the new trends and technologies in the flash & networking space.  I will also discuss how these technologies apply to the Hyper-Converged Infrastructure (HCI) market in general and how Virtual SAN could potentially take advantage of these emerging technologies to deliver faster and more efficient storage to our customer base.

Key Technology Trends

Screen Shot 2016-02-25 at 10.30.17 AM

Three key technology trends to note are:

  1. CPU densities are increasing towards hexa/octa core systems
  2. High density flash devices at low price points are becoming more prevalent
  3. High speed local storage (NVMe/PCIe) and network interconnect (100GB, RDMA) are gaining popularity as well

Some of these new technologies are already an integral part of Hyper-Converged Infrastructure (HCI) – major OEM server platforms provide these hardware devices as part of their standard x86 server offerings today.   For instance, Virtual SAN Ready Nodes are already certified to use high end (high performance, high endurance) NVMe devices as caching tier for All Flash.  VMware aims to continue to innovate in order to provide lightning fast, hyper-efficient software define storage on All Flash for all kinds of workloads.  We believe that Virtual SAN is uniquely positioned to take advantage of these newer storage and networking interconnects to achieve exactly that.

Evolution of Faster Storage & Faster Network

Before we get into details of these high speed interconnects and their applicability to Virtual SAN, it is important to first take a look at how network and storage technologies have evolved over the years.

Screen Shot 2016-02-29 at 7.59.07 PM

A few years back, latencies/response times over a 10G network for I/O read/write were significantly lesser (in the order of sub-millisecond) than local disk latencies on NAND-based SSDs (in the order of milliseconds).  However, with the advent of faster storage interconnects like PCIe/NVMe and the new and upcoming 3D XPoint (highly dense NVDIMMs), local disk or persistent storage latencies have now been reduced to microseconds or in some cases, even nanoseconds. To put it in simple words, local storage enabled by these newer devices have now become much faster than the network.  As a result, All Flash storage vendors are now looking to further optimize network fetch times by leveraging 40G/100G/Infiniband networks and/or implementing a fast network access protocol like RDMA.

Key Characteristics of Purpose Built All Flash Appliances

Having discussed how storage and network speeds have evolved over the last several years, let’s now discuss how purpose built all flash appliances are built to deliver the best performance in terms of highest IOPS and lowest latencies.  The three key areas of focus for these solutions are:

  1. Building a scalable and distributed storage system (high speed redundant controllers, low latency caching delivered by high performance storage devices)
  2. Developing the most efficient caching technique (keep maximum working set in cache, minimize thrashing and I/O to disk)
  3. Minimizing combined storage+network latencies

In short, purpose built all flash solutions aim to deliver the highest IOPS with predictable low combined response times (storage + network latencies).  Different vendors follow different approaches to achieve this.  Some use faster byte addressable storage like NVRAM for caching which leads to performance acceleration and reduction of CPU overhead.  Others use an RDMA (Remote Direct Memory Access) based approach (which I will get to shortly).  In this approach, they attempt to store metadata entirely in DRAM and then perform high speed metadata updates across hosts over RDMA to drastically improve access times.

Regardless of the approach they follow, the fact remains that they have “purpose-built appliances” with custom hardware to enable faster caching and reduce disk+network latencies.  It was not considered feasible to do this in standard x86 servers connected on standard ethernet networks.  Until recently, that is.

How do we leverage High Speed Storage Interconnects for HCI?

So how has the landscape changed?  What are some of the recent developments that now enable us to leverage high speed storage interconnects for HCI?

As I alluded to earlier, historically, there have been a number of factors inhibiting the use of these high speed interconnects in HCI.   Some of these factors included higher cost (NVMe, NVRAM devices have very high $/Gig) , non-standard hardware (NVDIMMs are not considered standard HCI component) and the need for customers to upgrade their infrastructure (For example, in order to support Infiniband, customers have to adopt expensive Infiniband network adapters and switches).

As I mentioned earlier, this landscape is now changing.  And it is changing very fast.  High speed storage interconnects like NVMe (Non Volatile Memory Express) are becoming the de-facto standard in most enterprises.  Take a look at the chart below:

NVMe/PCIe Adoption is Increasing Rapidly

Screen Shot 2016-02-29 at 7.59.14 PM

Source:  https://www.einfochips.com/blog/k2-categories/semiconductor/five-factors-fueling-nvme-growth.html

As you can see, PCIe/NVMe is expected to replace SATA/SAS as the defacto flash interface in the enterprise by 2018.  The prices of high performance/high endurance NVMe devices (500,000 writes per sec, 10TBW) are steeply declining and these devices are very competitive to be used in the caching tier for All Flash.  At the same time, low endurance (<1TBW) NVMe devices at lower price points are also gaining popularity and will soon be suitable to be used as capacity tier drives.

High Density & High Speed Memory DIMMs

The next big trend we are seeing with flash vendors is innovation in packing a lot of persistent storage into high density, high endurance and high speed memory cards in a DIMM form factor, case in point being Micron & Intel’s 3D XPoint and Samsung’s 3D NVDIMM.

3D XPoint™ technology is an entirely new class of nonvolatile memory that can help turn immense amounts of data into valuable information in real time. With up to 1,000 times lower latency and exponentially greater endurance than NAND, 3D XPoint technology can deliver game-changing performance for high performance computing, big data and other high speed applications. Its ability to enable high-speed, high-capacity data storage close to the processor creates new possibilities for system architects and promises to enable entirely new applications.  Undoubtedly, this would drive the use of persistent storage towards DIMMs, especially for workloads that require low latencies.

How are these high speed storage interconnects applicable to Virtual SAN?

Let’s first understand how these high speed storage interconnects compare from a technology and use case perspective.

 Comparison of High Speed Storage Interconnects

Screen Shot 2016-02-29 at 7.59.40 PM

NVMe with Virtual SAN

Virtual SAN enables NVMe today as a caching tier (added to PCIe slots).  This is a good fit for highly transactional workloads requiring high IOPS and consistent sub-millisecond latency response times.  We are also looking into how to leverage the newer NVMe 1.2 spec and take advantages of multiple queues/namespaces for faster I/O processing and higher space efficiency with Virtual SAN.

NVDIMMs with Virtual SAN

Persistent DIMM-based storage is now evolving to a point that you can pack a lot of density into non-volatile memory DIMMs with 1000X the endurance of NAND, making it a very good fit for use cases such as

  1. High speed & efficient metadata caching
  2. Staging of services like de-duplication & compression.  This can help improve overall speed & efficiency
  3. Reduction of write amplification.  Enable byte-aligned writes on DIMMs instead of 4K aligned writes on SSDs.

Clearly, today NVDIMMs are the most expensive of the technologies in question, specifically from an HCI perspective.  However, we believe the additional CAPEX investment for these DIMMs will be offset by the benefits they provide in terms of better space efficiency and increased performance compared to SSDs offered in the standard PCIe or even SATA/SAS form factors.  The price of these DIMMs from a $/Gig perspective is also expected to drop over the coming years and we expect this to become a standard offering in commodity x86 servers.

While designing our next generation All Flash, we expect to take the above trends and dense DIMM form factors into consideration in our design to deliver the most efficient $/Gig and highest performant All Flash solution.

How do we Leverage High Speed Network Interconnects for HCI?

Now, let’s take a look at high speed network interconnects, starting with the high speed network protocol – RDMA (Remote Direct Memory Access) and how its implementation on converged ethernet makes it very amenable to use with HCI.

RDMA

Why RDMA?

Screen Shot 2016-02-29 at 8.00.00 PM

RDMA enables direct memory access from memory of one host to another without involving either one’s OS.  This means writing directly to the network, reducing latencies (Zero-copy transfers saves CPU cycles, context switches and memory bandwidth on both hosts) permitting high throughput and extremely fast message transfers compared to the conventional transfers over TCP/IP stack.

How is RDMA applicable in the context of HCI?

Initial implementations of RDMA required customers to have a dedicated network (adapters and switches) of Infiniband.  Infiniband is a physical direct or switched interconnect between servers and storage systems.  In the context of HCI, this proved to be a non-starter as the requirement for customers to upgrade their physical infrastructure is directly in conflict with the simplicity of HCI.   HCI allows customers to run traditional storage workloads on commodity x86 servers and standard ethernet networks.  Any kind of physical infrastructure upgrade is non-trivial and costs a lot of money.

Newer specs of RDMA such as RoCE (RDMA over Converged Ethernet) and iWARP (Internet Wide Area RDMA Protocol) now allow implementation of RDMA over standard ethernet networks.  HCI Customers do not require upgrading their network infrastructure as long as the Network Interface Cards (NICs) on commodity x86 servers are RDMA-enabled.  This opens up several possibilities for HCS software like Virtual SAN to take advantage of the high speed, low latency transfers that RDMA has to offer on industry standard x86 hardware.

For starters, Virtual SAN could be RDMA-enabled to support high speed access across hosts which are distributed in a Virtual SAN cluster and reduce any network bottlenecks associated with these operations.  Other potential areas of improvement include but are not limited to improving any operation that takes up significant network latency like rebuilds during failures, reducing CPU (of both source and destination hosts), network bandwidth and latency overhead associated with operations that transfer significant I/O between distributed hosts in a cluster.  RDMA also opens up opportunities for us to innovate and further optimize our distributed network, caching and storage algorithms in the Virtual SAN software stack.

Conclusion

  1.  Purpose built All Flash Appliances rely on custom built hardware such as NVRAM & Infiniband adapters to deliver the best IOPS and efficiency, especially for high performance computing workloads.
  2. The Hyper-Converged Infrastructure landscape is rapidly changing.  Adoption of newer hardware technologies (NVMe, NVDIMM, RDMA etc) will be driven by standardization and commoditization.  The factors contributing to this are a) Declining cost  b) Major Server OEMs offering these as part of commodity x86 server inventory and c) These technologies seamlessly fit into standard HCI infrastructure not requiring a huge network/storage overhaul which can be disruptive and cost prohibitive.
  3. As All Flash vendors compete to squeeze out the best possible performance & efficiency for High Performance Computing (HPC) workloads, technologies like NVMe, NVDIMM (3D Xpoint) & RDMA will become more and more relevant to vendors in the HCI space.
  4. Market data on declining prices and wider enterprise adoption clearly indicate these technologies will soon become commonplace in HCI.  They will no longer be restricted to being part of dedicated hardware and ASICs exclusively provided by All Flash appliance vendors.
  5. At VMware, we recognize that there is immense potential for us to adopt these high speed technologies for Virtual SAN as we constantly strive to deliver the best HCI All Flash solution.  In doing that, we will continue to innovate by enhancing our market leading All Flash hyper converged software to adopt these faster storage and networking interconnects and thereby deliver the best possible performance, space efficiency and reliability for all kinds of virtual workloads.

Keep watching this space to learn about more exciting innovations in Virtual SAN!