Last week I had the opportunity to attend Intel Partner Connect and catch up on all the new updates coming from Intel in areas like data center, IoT, and artificial intelligence. One thing is certain, Intel is shifting its focus from just PC-centric to data-centric. Taking center stage in their datacenter solutions this week was VMware vSAN with Intel Optane and 3D NAND. I had some really great conversations with several Intel partners on how best to design an HCI solution using vSAN that delivers incredible performance and tremendous value to customers. In this post I want to share a few things to consider when choosing storage for your vSAN Cluster.
Filling the Memory and Storage Gap
There’s an old saying in software development that more or less says “Fast, Cheap, Good…. you can pick two” If you replace “good” with “Density” it really applies to the memory and storage gap we have seen for many years where there have been massive tradeoffs between placing data on cheap, but extremely slow hard disk drives or placing it in very fast, but very expensive DRAM. NAND (2D and then 3D) helped bridge this gap for sure, but the need for higher performance, higher capacity and lower cost still existed. Then Intel and Micron announced a new type of memory called 3D XPoint that’s faster than NAND, denser than RAM and nonvolatile. VMware began working with Intel on the much anticipated Enterprise SSD form of 3D XPoint called Optane and last spring announced day 1 support on vSAN 6.5 and ESXi 6.5 with the Intel® Optane™ SSD DC P4800X.
What is 3D XPoint?
3D XPoint is a non-volatile memory (NVM) technology developed by Intel and Micron that is about four times denser than DRAM and while more expensive than NAND Flash, is significantly faster when it comes to latency and reading\writing small bits of data. Intel’s Enterprise SSD form of 3D XPoint, branded “Optane” is built for applications with high read/write loads, looking for low latency.
Intel Optane is based on revolutionary material not found in NAND. In terms of material, traditional 2D Flash uses “floating-gate” MOSFET technology to hold charge, while 3D NAND uses Charge Trap Flash (CTF) technology. Without diving into an entire lesson on reading and writing phase-change memory I’ll just say the 3D XPoint stackable cross-gridded data access array design eliminates the need for transistors and is a completely new and significantly faster way to switch between 1s and 0s.
Intel Optane is bit addressable and Write-in-Place. This means each individual byte can be overwritten. This sets it apart from NAND flash where reading and writing takes place at page granularity but each page can only be written once. To write it again, it must first be erased, and erasure takes place at individual block granularity which can be time consuming depending on the amount amount of blocks that need to be shuffled around prior to the erase. Because 3D XPoint has a write-in-place mechanism there is no need to erase when writing, and both reads and writes can occur on individual bytes.
vSAN Disk Groups Using Intel SSDs
All hosts participating in a vSAN cluster with local storage have the local disks configured in disk groups. One disk is categorized as the “cache tier” and the remaining disks as the “capacity tier”. vSAN can be architected in two ways; Hybrid vSAN and All-Flash vSAN. As this post is dedicated to using Intel SSDs I will only discuss All-Flash vSAN. For details on all vSAN configurations be sure to visit storagehub.vmware.com
In an All-Flash vSAN, the cache tier serves all writes and therefore it is preferable to have SSDs with very low latency and very high endurance. The capacity tier serves all reads and as such, is preferable to have SSDs with higher capacity but lower cost and endurance in comparison to the cache tier. Adding Intel Optane SSDs at the cache tier delivers better performance, better error handling, more power efficiency, and better TCO. Before designing your vSAN cluster be sure to review the vSAN Design and Sizing Guide.
Performance Results: Up to 2.5X More IOPS
You may recall last spring VMware announced day 1 support for Intel® Optane™ SSD DC P4800X with breakthrough performance enabling up to 2.5X increase in IOPS and up to a 2.5X reduction in disk latencies. Intel and VMware performed a lot of joint testing running the Optane SSDs in both a bare ESXi server and as the caching layer in an all-flash vSAN configuration. The test results were fantastic and the joint Intel Optane and VMware tests delivered the following results::
- Optane NVMe on ESXi delivers up to 95% of the raw device IOPS (both read and write) when there are multiple applications writing directly to the NVMe device
- Optane NVMe as the caching tier for VMware vSAN enables up to 2.5X greater IOPS* for write-intensive workloads when compared to the same tests run using the Intel P3700 NVMe SSD as the caching tier. In addition, the Optane tests revealed up to a 2.5X reduction in disk latencies* for write intensive workloads.
Intel’s 3D XPoint technology branded Optane delivers an unparalleled combination of high throughput, low latency, high QoS, and ultra-high endurance. For more information on the new the Intel® Optane™ SSDs you can learn more here: Intel® Optane™ SSD DC P4800X
Ready to evolve your IT infrastructure now with vSAN, then read more on How to Get Started with HCI.
A few months back on the Virtually Speaking Podcast John Nicholson and I had the opportunity to discuss Intel SSDs with Intel’s own James Myers
*Tests were run in a controlled environment and the actual performance improvements will vary depending on the environment and the nature of the specific write-intensive workloads in question.
@vPedroArrow
Hi Pete,
Enjoyed your post. Just a note to correct some statements in the post and provide clarity.
1) First, the old saying is not from software dev but hardware design, dating back several decades. The saying is “I can design it to be fast, cheap, or reliable, pick any two.” Notice the last word – reliable. Now, you may have assumed that in using the word “good”, you incorporated ‘reliable’, but the real phrase is a bit different. The interesting thing about 3DX is that compared to DRAM, it’s cheap but it’s not fast or reliable. Compared to NAND, it’s fast and reliable, but not cheap. So, depending on your reference point, you either get one or two out of three 🙂
2) Your description of 3DX is slightly off the mark. Strictly speaking, in the memory taxonomy, 3DX is a form of persistent memory (PM). This is a subset of a larger group, called non-volatile memory (NVM). Read the SNIA definitions of the terms – PM is byte-addressable and uses memory semantics (load/store), while the larger category NVM is block-addressable and uses storage semantics (read/write). Turns out that 3DX can do both – but because it does the former, it’s classified as persistent memory. Please use that term going forward when describing 3DX.
3) You also say that 3D NAND is charge-trap. That’s not completely true. There are forms of 3D NAND which are floating-gate, and are quite popular in the marketplace.
4) Finally, you mention that 3DX is phase-change memory. Again, that’s not completely true. 3DX is not exactly PCM in terms of how PCM was originally developed and implemented. Now, what 3DX is -exactly-, I won’t say. You are correct when you say 3DX is a transistor-less design, at least for the media itself. Obviously, there are plenty of transistors around it for logic and control from the host.
Thanks for posting the article – I enjoyed reading it.
We had a huge vSAN deployment with Optane 375GB drives for cache in an all NVMe Dell configs. Great IOPS.. HOWEVER !!! *WARNING* Optane is so fast it constantly fills the small amount of cache as a ratio to data storage space & creates constant De-staging spikes, which saturates the network and caused regular dropped packets at PEAK periods. Optane drives may not be worth it unless you double the write cache drive to ~750GB+ (though ONLY 600GB of cache is used by vSAN) AND double you network ports and ensure 25Gb or 100Gb NICs.
Correct. You were experiencing exactly what I describe. It is one of the reasons why I would always recommend the largest Optane device that is close to the logical limit of the write buffers (currently 600GB, but subject to change). Thanks for sharing.