If there is one thing that is sure to fire up a debate in VMware storage circles, it is whether or not a VMFS volume should make use of extents. I’ve watched many an email thread about this, and engaged in a few discussions myself. What I want to do in this post is show what some of the pros and cons are, explode some of the myths, and then let you make up your own mind as to whether you want to use them or not. I’ll give you my own opinion at the end of the post.
What is an Extent?
Probably best to describe what an extent is first of all. A VMFS volume resides on one or more extents. Each extent is backed by a partition on a physical device, such as a LUN. Normally there will be only one extent for each LUN (whole LUN contains a single partition which is used as a VMFS extent). Extents are used when you create a VMFS volume, and more extents can added if you want to expand the volume.
The maximum size of a single VMFS-3 extent was 2TB. This was due to a number of things, including our reliance on SCSI-2 addressing and the MBR (Master Boot Record) partition format. Adding additional extents to a VMFS-3 volume was the only way to extend it above 2TB. No single LUN/extent used for VMFS-3 could be above 2TB. Adding 32 extents/LUNs to the same VMFS-3 volume gave you a maximum size of 64TB. VMFS-5 volumes, introduced in vSphere 5.0, can be as large as 64TB on a single extent. This is because we implemented the GPT (GUID Partition Table) format & made a significant number of SCSI improvements.
This all seems ok, right? So what’s the problem? Why have extents got a bad name? Let’s begin by exploding a few of the myths around extents.
Misconception #1 – Extents are like RAID stripes
This is one of the common misconceptions. I’ve seen some folks believe that the Virtual Machines deployed on a VMFS volume (with extents) are striped (or the file blocks/clusters allocated to the VMs are striped) across the different extents.
This is not correct. Extents are not like stripes. If anything, extents are more akin to concatenations than stripes. They do not rotate Virtual Machine assignments or even VM block or cluster allocation assignments across different extents in the datastore.
I think this misconception arises because it is being confused with how resource management does things on a VMFS volume. VMFS Resource Management attempts to separate cluster resources on a per host basis to avoid any lock contentions, etc. You may observe VMs from different hosts being placed at completely different offset locations on a VMFS datastore, and perhaps even different extents. My good friend Satyam Vaghani did a very good presentation on this at his VMworld 2009 session, TA3320. The hosts try to put distance between themselves on the datastore to reduce any contention for resources, but they still try to keep the objects that they manage close together.
The X-axis represents the volume layout. The files on the left are from host A, the files to the center right are from host B. As you can see, host B’s file are offset from the start of the volume. These offsets typically follow a uniform distribution across the entire file system address space. So, in a multi-extent situation, concurrent hosts accessing a VMFS volume will naturally spread themselves out across all extents (since the address space is a logical concatenation of all extents). In effect, if you have multiple hosts accessing a VMFS volume, the load may be distributed across multiple extents. Note that if it is a single host or a very small number of hosts on a very large number of extents, the load may not be evenly distributed.
Not only that, but the resource manager also tries to allocate a contiguous range of blocks to disk files (thin, thick, eagerzeroed thick) as can be seen by this slide also taken from Satyam’s presentation.
See this example of VMs with different virtual disks deployed to a VMFS-3:
Here the X axis is the volume layout, and the Y axis represents the number of blocks. Of course, as available space reduces, you could find a VM spanning two or more datastores on an extent. The same could be true for thin disks, which might need to allocate its next resource cluster from another extent.
Because of this contiguous allocation of space (which can be in the order of 100MBs or even GB), VMFS does not suffer from the traditional fragmentation issues seen on other filesystems. However, if a file that is grown by Host A at t0 is later grown by Host B at t1, and the same resource distribution scheme per host is in play, then it is likely that file block clusters for those files will be scattered across the logical address space. When you think about DRS management of VMs, and the use of thin disks, you can see that those disks will end up getting resources from various regions. Its still not enough to raise concerns about fragmentation however.
Misconception #2 – Losing one extent will offline the whole volume
Not completely true. Back in the VMFS-2 days, this was certainly true, but significant enhancements have been made to VMFS extents over the years that will allow a datastore to stay online even if one of its extent components is offline. See this posting I made on such enhancements. Now, we as yet don’t have this surfaced as an alarm in vCenter, but it definitely something we are looking at exposing at the vCenter layer in a future release.
However if the head extent (1st member) has a failure, then it can bring the whole datastore offline. Head extent offline condition is pretty much always going to cause failures because many of the address resolution resources are on the head extent. Additionally, if a non-head extent member goes down, you won’t be able to access the VMs whose virtual disks have at least 1 block on that extent.
But is this really any more problematic than having an issue with a LUN which backs a single extent VMFS volume? For the most part, no. Its only when the head has an issue that this has more of an impact.
Misconception #3 – Its easy to mistakenly overwrite extents in vCenter
I’ve heard this still being brought up as an issue. Basically, the scenario described is where vCenter shows LUNs (which are already used as an extents for a VMFS datastores) as free, and will let you initialize them when you do an Add Storage task.
If memory serves, the issue described here could be as old as Virtual Center 1.x (this was in the days before we started calling it vCenter). I’m pretty sure that this was resolved in version 2.x, and definitely is not an issue with the vCenter 4.x & 5.x releases. I think this occurred when you built an extent on one host, and then flipped onto a view from another ESXi host which didn’t know that the LUN was now in use. These days, any changes made to a datastore, where a LUN is added as an extent, updates all the inventory objects so that this LUN is removed from the available disks pool. Coupled with the fact that we now have a cluster wide rescan option for storage, there should no longer be any concerns around this.
Obviously, if you decide to start working outside of vCenter and decide to work directly on the ESXi hosts, you could still run into this issue. But you wouldn’t do that, would you? 😉
Misconception #4 – You get better performance from extents
This is an interesting one. It basically suggests that using extents will give you better performance, because you have an aggregate of the queue depth from all extents/LUNs in the datastore. There is some merit to this. You could indeed make use of the per device queue depth to get an aggregate queue depth for all extents. But this is only relevant if a larger queue depth will improve performance, which may not always be the case. I also think that to benefit from the aggregate queue depth, each of the extents/LUNs that makes up the volume may have to be placed on different paths, or possibly you may need to implement Round Robin, which not every storage array supports. So this doesn’t just work out of the box; there is some configuration necessary.
My thoughts on this are that if you are using a single extent datastore, and think a larger queue depth will give you more performance, the you can simply edit the per device queue depth and bump it up from the default value of 32 to, say 64. Of course, you should do research in advance to see if this will indeed improve your performance. And keep in mind that the max queue depth of your HBA and the number of paths to your device need to be taken into account before making any of these changes.
VMware’s Best Practice/Recommendation for Extents
I discussed this with our engineering and support folks, and in all honesty, considering the management overhead with extents, the new single extent VMFS-5 volume size, the ability to grow datastores online with the volume grow facility, and the ability to tune the per device queue depth, the recommendation would be to avoid the use of extents unless you absolutely have to use them. In fact the only cases I can see where you might need extents are:
- You are still on VMFS-3 and need a datastore larger than 2TB.
- You have storage devices which cannot be grown at the back-end, but you need a datastore larger than 2TB.
There is nothing inherently wrong with extents, but the complexity involved in managing them has given them a bad name. Consider a 32 host cluster which shares a VMFS volume comprised of 32 extents/LUNs. This volume has to be presented to each host. It becomes quite a challenge to ensure that each host see the same LUN in exactly the same way. Now bring something like SRM (Site Recovery Manager) into the picture, and if you want to failover successfully to a remote site, all of these LUNs needs to be replicated correctly, and in the event of a failover, they may need to be resignatured and mounted on all the hosts on the DR site. So this becomes a formidable task. And it is primarily because of the complexity that I make this recommendation. VMFS-5 does provide better management capabilities by allowing for these larger LUN sizes, which makes a significant amount of the storage administration overhead go away.
Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @VMwareStorage
Fermarloe
Very interesting post! Thanks for share your knowledge and conclusions!
ricky
I wanted to point out a performance related issue I have observed working with extents. A scenario may arise where a virtual machine is created across mutliple extents. (notice I didn’t say striped) While this is not a major problem, there is a slight performance hit. I could never determine the method ESX used to select which extent to place the virtual machine on, or if there was a way to prevent a virtual machine from ending up on two extents. Based on my observations, ESX tends to lump the virtual machines on one or two of the extents before moving on to use another extent. This hinders the ability to use multiple paths and also creates contention if too many virtual machine are trying to access the same LUN.
Daren DiClaudio
Thank you for posting this, this was exactly what I needed.
John Yarborough
Thanks for the great post! I do have a question though and maybe you refer to this situation in bullet point #2 at the end. I have an older SAN that only let’s me create LUN’s up to 2TB. Right now I have a 3 different LUN’s so that I can utilize all the capacity of the disks in the underlying aggregate. Would it make sense to combine these 2TB LUN’s into a VMFS volume with extents? Since they are physically all on the same disks anyway, would there be any real downfall? Especially considering I already have the 3 LUN’s living on the same disks? Thanks!
Cormac
As long as all LUNs are presented to each of the ESXi hosts in an identical fashion John, should be no concerns with putting all 3 LUNs into a single VMFS volume.
Jim
Hi, Great post that answered most of my questions. One thing I am not sure about is locking. Being a SAN guy, I would have expected the locking in a spanned VMFS to occur at the LUN or extent level. Thus affecting only individual LUNs/extents in a spanned volume. However everything I have read suggests that locks are applied at the VMFS level. Is this just a misconception brought about by the fact that most VMFS volumes are on a single LUN?
Cormac
Hi Jim,
When using extents, we only place the SCSI Reservation on the first extent of the volume. Therefore any I/O from VMs destined to blocks residing on extent 1 or higher would never see reservation conflicts. Of course, with VAAI, this is all moot.
Cormac
PiroNet
Hi Cormac,
How does extents affect maximum number of paths usable by a host?
Can we say that even though a host might be connected to a couple of large datastores, if there are baked by many many extents (LUN’s) this will greatly affect the number of paths a host can use?
i.e. a host with 2 HBA’s with 2 ports each and 4 uplinks can see a maximum number of 64 LUN”s – that is 1024 paths / (2x2x4).
If both datastores are baked by 32 extents each, we have reached maximum number of path right (2×32=64 LUN’s)?
To work around this issue, we can:
– decrease number of extents per datastore
– decrease the number of ports per HBA
– decrease the number of HBA (if lack of redundancy is not an issue)
Thx,
Didier
Cormac
Hi Didier,
Yes – the path count is based on LUNs, not datastores. So if you have multiples LUNs participating as extents in a VMFS datastore, each extent/LUN would count towards the path count.
Cormac
Sunny Dua
Excellent post. Appreciate you sharing the knowledge.
Have a question on the Volume Grow feature. If I have 4 luns of 512 GB each. 2 owned by SP A and the other 2 Owned by SP 2, can I create a Datastore of 2TB by using Volume Grow, I know that the answer is YES to this question but here is the myth I am trying to fight:-
This would utilize both the SP’s equally and port utilization on the storage end would be load balanced since I am talking to both the controllers while writing/reading data. Is this true? Or would I also go and talk to the LUN which has the Metadata written on it (1st LUN).
Do you think a better solution would be to use RR and atleast use all the ports to the Owning Controller??
Regards
Sunny
Cormac Hogan
I would go with RR. Even placing different extents on different paths is no guarantee that you will achieve load balancing. You may have some very busy VMs on one extent and idle VMs on another extent. Also, the management overhead in configuring LUNs on different SPs make it somewhat tedious to setup. Use RR, and let it do its thing 🙂
Lony Namer
I have a question. I am building a cluster with HP C7000 blade center and 3PAR V7200 storage 30 x 900 GB SAS disks. 7 servers HP Server GEN8 with dual 6 core CPU and 128 GB Ram each. I have 500 VMs in total 13 TB. 2 x 8 Gb fiber switches , 2 controllers (Each controller has 2 CPUs) and 4 paths to controllers. Each server connected to each fiber switch. VCENTER AND ESXI 5.1 U1
I am not an expert but researched a lot. This storage is totally virtual, divides all the LUNs to all the disks. Everything can grow and shrink easily and has it’s thin provisioned LUNs. It detects zeros, do not write them. It communicates natively with ESXI 5.X with VAAI , do a lot of things in it’s level and doesn’t lock the whole volume for SCSI reservation, each VMs lock granularily, a small part of a volume. So there is no SCSI reservation volume locking issue.
You don;t recommend multiple extents in VMFS5. So my question is, will it be a good practice to create a single datastore 16 TB sized 1 extent for not to have I/O performance bottle necks. Or the best is dividing 2 TB datastores in a SDRS cluster ?
Regards,
Lony
Saravanan AR
Does the VMware kernel behave differently
When you have a single extent as a Datastore which is thin provisioned at storage side ?
When you have multi extent as a single Datastore which is thin provisioned storage side ?
The difference i see is the free space on each case. I tried to put few VMDK file on single extent datastore the free space was 188 GB, When the same VMDK storage migrated to Multi extent datastore, the free space is just 28 GB.
Regards
Saran AR
Cormac Hogan
Sounds like those VMDKs changed format from thin to thick? Can you confirm if that is the case?
Kenneth
Do you have a spam problem on this site; I also am a blogger, and
I was wanting to know your situation; many of us have
created some nice methods and we are looking to trade solutions
with other folks, please shoot me an e-mail if interested.
Feel free to visit my page Massage London (Kenneth)
Mark Jarvis
Hi,
Great article
I would like to confirm a point you raise.
You state that see the need for extents include “2.You have storage devices which cannot be grown at the back-end, but you need a datastore larger than 2TB”. But unless I missunderstand (as the GUI seems to reflect this) even if you increase the backend LUN/Volume size, asking vSphere to increase the size of the datastore creates an extent on the same LUN rather than resizing the vmfs partition.?
Regards
Mark
George Parker
Great article! Two questions I’ve got:- 1. On a VMFS-3 Volume with multiple extents, does the “Head” extent contain metadata that the read/write heads of the drives have to move backwards and forwards to therefore negatively impacting performance?
2. If I upgrade a VMFS-3 Volume containing multiple extents to VMFS-5, are there any performance implications?
Regards,
George.
Ravindra M
What happens when we extend the same LUN and increase the volume ?? any thing to be taken into consideration before doing ??
Vittorio
Hi Cormac,
very good insightful article, thanks for sharing!