vSAN

NVMe Readiness – Part Three

NVMe Specification Readiness

NVM Express® is an open collection of standards and information and the NVM Express organization (www.nvmexpress.org) is responsible for specifying the NVMe specifications. To date (August 2018) it has released several specifications:

  • Base NVMe Specifications including NVMe 1.0e, 1.2.1 and 1.3
  • NVMeTM Over Fabric 1.0 Specification and
  • NVMeTM Management Interface 1.0 Specification

For a summary of the released 1.2.1 and 1.3 Base NVMe Specifications and the draft 1.4 (or whatever the NVMe Standards body chooses to ultimately name the next revision), Work-In-Progress specification see Base NVMeTM Specifications and Roadmap.

For a summary of the released 1.0 NVMe Over Fabric Specification and the draft 1.1 Work-In-Progress specification see NVMeTM Over Fabrics Specifications and Roadmap.

For a summary of the released NVMe Management Interface 1.0 Specification and the draft 1.1 Work-In-Progress specification see NVMeTM Management Interface Specifications and Roadmap.

Base NVMe Specifications and Roadmap  

Some of the major features of the NVMe 1.2.1 Specification include – Compare and Write Fused Operation, Namespace Sharing and Multi-path I/O, NVMe Reservations and End-to-end Data Protections.

  • NVMe Compare & Write is a fused atomic Command and used in the same way as SCSI Compare & Write
  • Namespace Sharing and Multi-path I/O are both relevant in Fabrics
  • NVMe Reservations is used in the same way as SCSI-3 Persistent Reservations
  • End-to-end Data Protection is compatible with T10 DIF/DIX and provides data protection via the Data Integrity Field (DIF)/Data Integrity Extension (DIX). DIF support includes Type 1, Type 2, or Type 3 and is selected when a namespace is formatted

The NVMe 1.3 Spec. was released in June 2017 and added new features including – Boot, Emulated Controller Performance Enhancement, Sanitize, Streams, Telemetry, NVMe-MI Send, NVMe-Receive among others.

  • Boot provides a simplified interface to access Boot Partitions that may be used for platform initialization code.
  • Emulated Controller Performance Enhancement reduces the number of PCIe doorbell writes performed by a guest OS. This avoids the VM trap into the hypervisor that is caused by a doorbell write.
  • Sanitize thoroughly destroys data in an SSD and may be important for repurposing and removal of potentially sensitive data before replacing or discarding a device
  • With Streams you potentially get less write amplification and longer life (endurance) for SSDs and less Garbage Collection (GC) and hence better system performance
  • Telemetry enables manufacturers to collect internal data logs to improve the functionality and reliability of products and to extract Firmware crash dumps
  • (NVMe-MI is a Management Interface that will be described later). NVMe-MI Send/Receive Commands allow a management application to tunnel management Commands via the normal NVMe driver avoiding the requirement for an OOB NVMe-MI driver

NVMe 1.4 spec. is Work-In-Progress and includes new features – Persistent Memory Region (PMR), IO Determinism (IOD), Asymmetric Namespace Access (ANA), Dispersed Namespaces, Namespace Write Protect among others. NVMe 1.4 is expected to be ratified in 2019.

  • PMR provides NVDIMM-like semantics over an existing PCIe interconnected SSD.
  • IOD defines mechanism to achieve guaranteed and deterministic bound on I/O latency delays. IOD also addresses workload interference called the Noisy-Neighbor problem especially during a Read.
  • ANA supports Multipathing and is useful when namespace access characteristics may vary based on the internal configuration of the NVM subsystem or which controller is used to access a namespace (e.g., Fabrics)
  • Dispersed Namespaces deals with use cases requiring cross NVM subsystems access e.g., active-active storage replication and/or online storage migration.
  • Namespace Write Protection may be used to prevent modification of the specified namespace and intended primarily for client devices, e.g., smartphones, where one wants to protect the OS from applications after boot.

For additional details see www.nvmexpress.org

NVMe SSD Readiness:

Currently (August 2018), most of vendor SSDs support the Base NVMe 1.2.1 Specification. We expect to see increasingly more support for NVMe 1.3 in 2019. Note that claiming specification compliance does not imply support for all of the optional features.

The following provides a summary of current support for some NVMe attributes or features of interest (see NVMe Base Specification Box to understand the use cases for these features) from 6 major SSD vendors. These SSD vendors are also VMware storage partners and their indicated support means support in at least one of their products.

  • NGUID and EUI-64 namespace Identifiers: Good current and future support for NGUID and EUI-64 Namespace Identifier
  • Dual-Port: All vendors indicated current support and most everyone indicated future support – an important feature in support of namespace sharing and multi-path
  • Sanitize: Good current and future support
  • NVMe Reservations: Current support from half the vendors and good future support
  • Compare & Write: No current support for the Compare & Write Fused Operation feature and about half indicated future support
  • End-to-End Data Protection: Current support for end-to-end data protection (DIF) from half the vendors and future supports looks good
  • Multi-Streams: No current support for Multi-Streams Directive although most indicated future support
  • SRIOV: Current support for SRIOV (basic) from only one SSD vendor and future support does not look promising

OS Readiness:

On the host side there appears to be excellent OS support for NVMe including: Linux (RHEL/Suse/Ubuntu), Windows and VMware. Please check  http://www.nvmexpress.org/drivers/ for the latest driver details.

The NVMe 1.2.1 driver is supported in vSphere 6.7 and supports some optional feature such as Firmware Update Process, Namespace management, Namespace Attachment, Deallocate Dataset Management, Format NVM among others.

vSphere 6.7 supports a Virtual NVMe device called vNVMe that is designed to address challenges in reducing CPU cost per I/O and improving IOPS scalability for all flash storage. vNVMe presents a PCIe NVMe device emulation to the GOS and uses the GOS NVMe stack instead of the SCSI stack, hence leading to improved performance. vNVMe currently supports the native NVMe 1.0e driver in several Guest OS. Expect to see future releases of vNVMe for the GOS native NVMe 1.2.1/1.3 drivers. See https://kb.vmware.com/s/article/2147714 for current OS support.

Fabric Specification Readiness:

The NVM Express organization to date (August 2018) has released the NVMeTM Over Fabric 1.0 Specification. The specification requires the underlying NVMe transport to provide reliable NVMe command and data delivery. The transport is specified as an abstract protocol layer independent of any physical interconnect properties. For a summary of the released 1.0 NVMe Over Fabric Specification and the draft 1.1 Work-In-Progress specification see the box NVMeTM Over Fabrics Specifications and Roadmap.

NVMe Over Fabric Specifications and Roadmap 

The NVMe Over Fabric 1.0 Specification was released in June 2016. The NVMe Over Fabrics 1.0 Specification includes RDMA (RoCE, IB, iWARP) and FC (includes FCoE) transports and supports Discovery Service. Note that FC-NVMe is specified by a separate INCITS T11 doc – see www.t11.org.

The NVMe Over Fabrics 1.1 Specification is WIP for the TCP NVMe transport and expected to be ratified later in 2018. The NVMe Over Fabrics 1.1 is expected to add Enhanced Discovery and Graceful Termination among other support.

Note that the draft Base NVMe 1.4 Specification is working on Asymmetric Namespace Access (ANA) (multi-pathing support when namespace access characteristics vary based inside a NVM subsystem) and the Dispersed Namespaces (cross NVM subsystems access) technical proposals that primarily apply to fabrics.

For additional details see www.nvmexpress.org.

Fabric Transport Readiness

Currently, it appears that NVMe-oF/RDMA (ROCEv2) and NVMe-oF/FC are the two leading fabric transports among the HBA/NIC and flash array vendors. InfiniBand fabric transport support exists but from very few vendors. See the list of vendor products at http://nvmexpress.org/products/.

Fabric OS Readiness

VMware is planning to support NVMe over Fabrics, initially with NVMe-oF/FC and NVMe-oF/RDMA transports. VMware developed NVMe-oF/RDMA initiator driver will be inbox driver and should work with any certified ROCE v2 driver. NVMe-oF/FC support will be added to HBA drivers supplied by respective FC HBA partners. Expect to see VMware support for both NVMe-oF/RDMA and NVMe-oF/FC in a future vSphere release.

Plug-fest and Interop

The NVMexpress Interop Workgroup is developing Conformance and Compliance Testing in collaboration with IOL UNH. The UNH Interoperability Labs conducts periodic interop tests with SSD, array and host vendors and publishes the NVMe Integrator’s List https://www.iol.unh.edu/registry/nvme

A recent (Oct 2017) FCIA NVMe-oF/FC plug fest conducted at University of New Hampshire Interoperability Laboratory (UNH-IOL) had participation from several (9) vendors. https://www.hpcwire.com/off-the-wire/second-industry-wide-multi-vendor-plugfest-focused-nvme-fibre-channel-fabric-completed-fibre-channel-industry-association/

Note: I’ll be covering NVMeTM Management Interface in Part 4.