In VMware Virtual SAN 6.2, we introduced several features highly requested by customers, such as deduplication and compression. An overview of this feature can be found in the blog: Virtual SAN 6.2 – Deduplication And Compression Deep Dive.
The deduplication feature adds the most benefit to an all-flash Virtual SAN environment because, while SSDs are more expensive than spinning disks, the cost is amortized because more workloads can fit on the smaller SSDs. Therefore, our performance testing is performed on an all-flash Virtual SAN cluster with deduplication enabled.
When testing the performance of the deduplication feature for Virtual SAN, we observed the following:
- Unexpected deduplication ratio
- High device read latency in the capacity tier, even though the SSD is perfectly fine
In this blog, we discuss the reason behind these two issues and share our testing experience.
- We used IOBlazer to populate the disks. We configured IOBlazer to run 100% large sequential writes. This was to make sure all the blocks were allocated before testing any read-related workload. Some people prefer to zero out all the blocks using the dd command, which has a similar effect.
- We then ran an Iometer workload. We set the read percentage, randomness, I/O size, number of outstanding I/Os, and so on.
We found, however, that there were two issues with the above procedure when testing the deduplication feature:
- Iometer did not support configuring I/O content. In other words, we could not use Iometer to generate I/Os with various deduplication ratios in step 2.
- We should not have populated the disks using IOBlazer or dd in step 1 because each utility pollutes the disks with random data or zeros, both of which yielded the wrong deduplication ratio for later tests.
To address these issues, we decided to use the Flexible I/O (FIO) benchmark to both populate the disks and run the tests. FIO allowed us to specify the deduplication ratio. By following these steps, we were able to successfully test the deduplication feature in Virtual SAN 6.2:
- Run FIO with 100% 4KB sequential write with the given deduplication and compression ratio. This will populate the disks with the desired deduplication and compression ratio.
- Run FIO with the specified read/write percentage, I/O size, randomness, number of outstanding I/Os, and deduplication and compression ratio.
Below is a sample configuration file for FIO. We modified the parameters for different tests.
[global] ioengine=libaio; async I/O engine for Linux direct=1 thread ; use thread rather than process group_reporting ; Test name: 4K_rd70_rand100_dedup0_compr0 runtime=3600 time_based readwrite=randrw iodepth=8 rwmixread=70 blocksize=4096 randrepeat=0 blockalign=4096 buffer_compress_percentage=0 dedupe_percentage=0 [job 1] filename=/dev/sdb filesize=25G [job 2] filename=/dev/sdc filesize=25G
If steps 1 and 2 were not performed properly, the results could be unexpected. To further illustrate that, we take two issues we encountered as examples.
Issue #1: The SSD showed high read latency, but the SSD hardware had no issues
We observed a high device read latency issue with FIO micro-benchmarks. The high read latency occurred because we were issuing a large amount of concurrent I/O (outstanding I/O, also known as OIO) to the same Logic Block Address/LBA (or a small range of LBAs) on the SSD. This is more likely to happen with any type of deduplication solution, regardless of the storage vendor.
To resolve this issue, we first performed a test to learn the behavior of the SSD device. Below shows the read latency to one address with an increasing amount of outstanding I/Os.
4KB read from the same LBA: 1 OIO Latency: 0.12 ms 16 OIOs Latency: 1.51 ms 32 OIOs Latency: 3.06 ms 64 OIOs Latency: 6.07 ms 128 OIOs Latency: 12.08 ms 256 OIOs Latency: 12.68 ms
When we issued multiple OIOs to a single 4KB block, those I/Os were serialized to one single channel inside the SSD device that was connected to that offset. In other words, we lost the benefits of the SSD’s internal parallelism (from multiple channels). The device latency rose as we increased the number of OIOs. High OIO to the same LBA (or a smaller range of LBAs) caused high device read latency.
In the extreme case where we prepared the disk by zeroing out all the blocks, all the data was deduplicated to one block. As a result, the upcoming read I/O was issued to the same device address, which caused high device read latency as discussed above.
Figures 1 and 2 (stats from Virtual SAN observer tool) show sample results from our test. (Even though the screenshots show HDDs, our test was on an all-flash Virtual SAN cluster. The HDDs in the graph actually mean SSDs used as capacity tier devices.) As can be seen, inside one disk group, one capacity tier SSD (naa.55cd2e404ba2ce71 in Figure 1 or naa.55cd2e404ba535b7 in Figure 2) always shows higher than usual read latency (than in the other capacity tier SSDs). This is because we zeroed out the data blocks before running the test. Later in the test, a large amount of outstanding read I/Os were issued to a single address on that SSD.
Note: In Figures 1 and 2, where there are no units specified, the unit is milliseconds. Where an “m” is specified, the unit is microseconds. Where “k” is specified, the unit is thousands.
Figure 1. Sample test 1 showed the first capacity-tier SSD (“HDD”) to have up to 3 milliseconds latency, which is much higher than the next two capacity-tier SSDs (also labelled “HDD”), which show just below 150 microseconds of latency.
Figure 2. Sample test 2 is similar to sample test 1. The bottom capacity-tier SSD (“HDD”) shows up to 6 milliseconds of latency, whereas the first two show slightly over 100 microseconds.
Issue #2: Deduplication ratio was not what we set
Because Virtual SAN distributes multiple virtual disks across its datastore, it is hard to determine the exact deduplication ratio of the data that the workload generated. In the FIO configuration file, we changed the dedupe_percentage to a desired value. However, in the testing system, there were a couple factors that affected the actual deduplication ratio reported by Virtual SAN.
- I/Os from other virtual disks (vmdk files) can have the duplicated data. In the FIO configuration file, if the randrepeat parameter is set to 1, FIO will use the same random seed for all the disks. Although the data pattern obeys the dedupe_percentage set by the user for each vmdk, there will be high deduplicated data across vmdks. Note that those vmdks are placed on the same Virtual SAN datastore, which means that datastore will see more duplicated data than specified.
- I/O size when preparing disks will affect the deduplication ratio. Currently, Virtual SAN uses 4KB chunk size as the unit to calculate the deduplication ratio. If the user uses non 4KB IO size to prepare the disk, Virtual SAN could see a different deduplication ratio. Meanwhile, if the IO is not aligned to 4KB (blockalign parameter), Virtual SAN could also observe a different deduplication ratio.
Figure 3 (below) shows a sample test in which we ran FIO with a 0% deduplication ratio. Due to the issues described above, Virtual SAN erroneously reports about an 80% deduplication ratio (shown in blue).
Figure 3. The blue line shows a deduplication percentage of about 80%, even though we set deduplication to be 0%.
To avoid these problems, we suggest performance testers use a 4KB I/O size (and aligned) and set randrepeat to 0 to prepare the disk in order to get the desired deduplication ratio. Note that Virtual SAN can properly handle any type of I/O configuration. The purpose of this blog is to explain the possible discrepancy between the FIO-specified dedupe_percentage and the Virtual SAN reported deduplication ratio if performance testers use different I/O configurations to evaluate the Virtual SAN datastore.
Figure 4. No blue line is shown, indicating the correct deduplication of 0%.