NVMe Management Interface Architectural Overview
NVMeTM Management Interface (NVMe-MITM) is targeted at enterprise and hyperscale applications and currently not at client applications. The use cases include: Inventorying (Asset Management), Health Monitoring (Identify bad drives), Wear Monitoring (Replace drives nearing wear-out), Temperature Monitoring (Fan throttling), Power Monitoring and configuration, Configuring (format drives, crypto erase), Change Management (update FW), etc.
Figure 7 shows the NVMe-MI Architecture. NVMe-MI is a programmable interface that allows Out-of-Band (OOB) Management and In-band Management of an NVMe storage device Field Replicable Unit (FRU). OOB Management operates with hardware resources and components that are independent of the Operating System Control (e.g., Server Baseboard Management Controller (BMC)). OOB Management interfaces include: SMBus/I2C and PCIe Vendor Defined Messages (VDM). (See the NVMe Management Interface 1.0 Specifications for more details).
In-band Management (added in NVMe-MI 1.1 and NVMe 1.3 Specifications) allows an application to tunnel NVMe-MI Commands using the NVMe-MI Send and NVMe MI-Receive Commands through the NVMe driver Admin Queue. The main use for In-band management is management without the need for the OOB NVMe-MI Driver; it provides NVM subsystem health status reporting, Vital product Data (VPD) access and Enclosure management. Although, not defined in its scope, it can also be applied to NVMe-oF utilizing the NVMe-MI-Send/Receive Commands.
Figure 7: NVMe Management Interface Architecture
The OOB NVMe-MI layer makes use of the Management Component Transport Protocol (MCTP) commonly supported in server platforms with bindings to SMBus/I2C or PCIe VDM (see NVMe Management Interface 1.0 Specifications for more details).
The NVM storage device being managed consists of an NVM subsystem with one or more ports and an optional SMBus/I2C interface as shown in Figure 8. The NVMe-MI Commands are meant for PCIe connected NVMe storage device field replaceable units (FRUs).
Figure 8: Example NVMe-MI Managed Storage Device
Management Interface Specification Readiness:
The NVM Express organization as of August 2018 has released the NVMeTM Management Interface 1.0 Specification. For a summary of the released NVMe Management Interface 1.0 Specification and the draft 1.1 Work-In-Progress specification see the box NVMeTM Management Interface Specifications and Roadmap.
NVMeTM Management Interface Specifications and Roadmap
The NVMeTM Management Interface (NVMe MITM) 1.0 was released in 2015. The NVMe MI 1.0 provides specific commands that determines the NVM subsystem Health Status (e.g., Unrecoverable error, reset required, PCIe status, Controller SMART / Health Information, Composite temperature and Controller status), Controller Health Status Poll that determines changes in health status attributes associated with one or more Controllers in the NVM subsystem, VPD Read/Write, Reset NVM subsystem, etc.
The NVMe-MI 1.1 Specification is WIP and expected to be ratified in 2018. NVMe-MI 1.1 supports tunneling NVMe-MI Commands in-band via an NVMe Admin Queue through a NVMe Driver. NVMe-MI 1.1 will support managing enclosure controllers, along with elements in the enclosure such as fans, LEDs, temperature sensors, etc.
Some major NVMe MI 1.1 work items in mature state include: support for In-band NVMe-MI, SES Based Enclosure Management, NVM Storage Device Extension, etc.. NVMe-MI leverages SCSI Enclosure Services (SES) for enclosure management and uses these same control and status diagnostic pages, but transfers them using the SES Send and SES Receive Commands.
For additional details see www.nvmexpress.org.
Management Interface Driver Readiness:
Currently (August 2018), there is very limited support for the NVMe-MI in servers. The 14th generation of PowerEdge Dell servers support Industry standard NVMe-MI Specification based NVMe SSDs: http://www.dell.com/support/manuals/us/en/04/poweredge-r940/idrac_3.00.00.00_ug/managing-pcie-ssds?guid=guid-888a99b6-bd06-4ec5-bf79-5acd59820b60&lang=en-us
Acknowledgments
I like to thank my colleagues at VMware especially Christos Karamanolis, Cormac Hogan and David Black and Austin Bolen at Dell EMC for providing me valuable reviews.
Additional posts in this series: