How big should my VMDKs be on vSAN?
I recently was asked about VMDK size best practices with vSAN. The maximum supported VMDK size is 62TB. Much like discussions around the largest size for a vSAN or vSphere cluster (Check out this post!), this is very much a nuanced operational discussion with a lot of interesting factors.
Back End Data Placement
One concern is that more objects lead to more striping on the back end of the cluster. As the vSAN design and sizing guide explains, large VMDKs (vSAN objects) will be “chunked” out into many smaller components based on a maximum component size of 255GB.
A single 62TB VMDK with NumberOfFailuresToTolerate= 1 will require 500 or so components in the cluster (though many of these components can reside on the same physical devices). There is not a sizable component difference between 62 x 1TB volumes or 1 x 62TB.
Cormac and others have explained this before, but vSAN does a pretty good job of splitting up large VMDKs across the backend.
Backup Operations
Depending on how you backup your data (in guest agents, VADP based changed block tracking, VAIO based data streaming) there could be implications on fewer larger VMDKs. While VADP will commonly only send data to a single data mover per VM, if the data could be split into multiple VMs this may allow for greater parallelization of data backups. If you are running forever full systems using CBT a 62TB backup may not pose challenges for initial backups, but can be more problematic if large changes are introduced, something resets the CBT map and forces a full, or in the case of vSphere Replication if a full reseed is needed. In the case of a large file server, utilizing DFS and splitting the shares up into multiple VMs could provide additional operational agility.
Splitting out volumes also allows backup software that works at the volume level to more easily “exempt” volumes, and the use of independent volumes can enable you to avoid snapshotting volumes that you do not need to backup.
Restore operations also could be impacted. While most data restores are small in nature (accidentally deleted files), an operator error that deletes a partition or other larger “blast radius” failure domain could lead to a longer restore time. This challenge can largely be mitigated by keeping full replicas of the VMs and activating this replica or using “instant recovery” style backup solutions (often using NFS or VAIO to share the data back).
Guest VM Performance
It is true that a single VMDK using a single vSCSI HBA poses a single serial IO queue. There are practical limits on how many IOPS and what throughput at specific latencies can be achieved, and in some rare cases more vSCSI HBAs (and volumes as they can not be multiplexed) may be necessary. Long erm NVMe end-to-end multiple queue IO paths will mitigate this. Note this is partly why some large database vendors will recommend this configuration as part of an RA etc.
Thin Provisioning and Capacity Management
Thin provisioning is a powerful technology that allows you to over-provision upfront, and avoid the hassle of having to constantly expand guest partitions and file systems as VMDKs need to grow. Unfortunately, many modern file systems are “thin unfriendly” and will gradually redirect all writes into free space on the partition until the VMDK becomes effectively ‘thick” and inflated. This can be mitigated by enabling vSAN Space Reclamation on the cluster (commonly used for persistent VDI clusters). For more information check out this video: #StorageMinute: vSAN Space Reclamation, and read this guide to using the feature.
File System Concerns
Some legacy file systems may not be advisable to run up to 62TB. Many Windows admins avoid running NTFS to this size for concerns around chkdsk performance, and EXT3 and older file systems may not scale to this size. While modern file systems (ReFS, and XFS) don’t have these issues it is a concern to be aware of.
Other Operational Considerations
A common design is used for isolating and splitting logs to their own partitions and VMDKs. While this can be useful for IO reasons (set RAID 1 for write-heavy volumes) it can also be useful for preventing logs filling up a volume from crashing applications that need to write to a volume. Especially with log systems that lack auto-grooming capabilities, or applications developers who have not discovered how to place logs into var/log.
Final Thoughts
vSAN takes a lot of thought out of managing large volumes for the administrator, but the administrator still has other operational considerations they should consider when designing virtual machines and VMDK sizes.