Home > Blogs > VMware vSphere Blog


Should I defrag my Guest OS?

Yep – this old chestnut. :-)

This has come up time and time again, and I am going to share with you some conversations that have been occurring within VMware on this topic. In fact, we’ve been having these conversations for a long time now.

What is it that defragmentation is supposed to give you?

Well, historically, if you ran a defragmentation* operation against an OS disk (typically Windows), you would expect to see a performance improvement. Defragmentation moves blocks around the disk to bring together blocks belonging to the same file in an effort to make the file contiguous on disk. This means that sequential I/O operations should be faster after a defrag. Here’s a view of the Disk Fragementer that is part of the System Tools with Windows 7:

Defrag2
What about defragmentation of a Guest OS in a Virtual Machine?

This is very different to running a defrag on a physical host with a local disk. Typically you are going to have multiple VMs running together on a VMFS or NFS volume. Therefore the overall I/O to the underlying LUN is going to be random so defragmenting individual Guest OS’es is not really going to help performance. However, there are other concerns that you need to keep in mind. The easiest way to explain the concerns is to give you some scenarios of what might happen to a VM which is defraged, and what impact it has on the various vSphere technologies. You can then make up you own mind about whether it is a good idea or not.

  1. Thin Provisioned VMs. If you defragment a Thin Provisioned VM, as file blocks are moved around, the TP VMDK bloats up, consuming much more disk space.
  2. Linked Clone VMs (vCloud Director, View). In the case of a VM running off of a linked clone, the defragmenter bloats up the linked clone redo logs.
  3. Replicated VMs (Site Recovery Manager, vSphere Replicator). If your VM was being replicated, and you defragemented the VM on the protected site, it could well cause a lot of data to be sent over the WAN to the replicated site.
  4. Snapshot’ed VMs. This is a similar use case to Linked Clones. Any VMs running off of a snapshot which ran a defrag would cause the snapshot to inflate considerably, depending on how many blocks were moved during the defrag operation.
  5. Change Block Tracking (VMware Data Recovery). The CBT feature is used heavily by backup products, including VMware Data Recovery (VDR). This feature tracks changes to a VM’s disk blocks during a backup operation. If a defrag is run during a backup operation, the number of blocks that changes will increase, which means more data will have to be backed up, meaning a longer backup time.
  6. Storage vMotion. Storage vMotion also uses CBT in vSphere 4.0. If a VM was being Storage vMotion’ed when a defrag operation was initiated, it would also impact the time to complete the operation since the defrag is changing blocks during the migration.

Defragmentation also generates more I/O to the disk. This could be more of a concern to customers than any possible performance improvement that might be gained from the defrag. I should point out that I have read that, internally at VMware, we have not observed any noticeable improvement in performance after a defragmentation of Guest OSes residing on SAN or NAS based datastores.

I also want to highlight an additional scenario that uses an array based technology rather than a vSphere technology. If your storage array is capable of moving blocks of data between different storage tiers (SSD/SAS/SATA), e.g. EMC FAST, then defragmentation of the Guest OS doesn’t really make much sense. If your VM has been running for some time on tiered storage, then in all likelihood the array has already learnt where the hot-blocks are, and has relocated these onto the SSD. If you now go ahead and defrag, and move all of the VM’s blocks around again, the array is going to have to relearn where the hot-spots are.

Defrag3
If you automate the defrag to run regularly, I think this could cause a performance decrease rather than give you any sort of performance gain if the VM is deployed on a datastore backed by tiered storage. This may already be enabled on some Operating Systems.

What do the Storage Array vendors say?

NetApp have a very good vSphere/NetApp interoperability WP in which they briefly discuss this topic. Quoting directly from the paper – “VMs stored on NetApp storage arrays should not use disk defragmentation utilities because the WAFL file system is designed to optimally place and access data at a level below the guest operating system (GOS) file system. If a software vendor advises you to run disk defragmentation utilities inside of a VM, contact the NetApp Global Support Center before initiating this activity.

What do you recommend?

My recommendation is not to use any defrag tools in the Guest OS. If you are being advised to use a defragmentation tool, you should now have a number of questions to raise about possible outcomes using the content in this blog posting.

* [1-March-2013] I wanted to add a clarification with regards to the defrag operation. This article is written with the generic Windows OS defragmenter in mind. Customers should be aware that VMware partners with vendors such as Condusiv/Diskeeper & Raxco who provide products which intelligently avoid fragmentation occurring in the first place, and also understand features like snapshots, etc. If excessive fragmentation is an issue in your environment, have a look at what these partners can offer.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter VMwareStorage

This entry was posted in Storage, vSphere and tagged by Cormac Hogan. Bookmark the permalink.
Cormac Hogan

About Cormac Hogan

Cormac Hogan is a senior technical marketing architect within the Cloud Infrastructure Product Marketing group at VMware. He is responsible for storage in general, with a focus on core VMware vSphere storage technologies and virtual storage, including the VMware vSphere® Storage Appliance. He has been in VMware since 2005 and in technical marketing since 2011.

23 thoughts on “Should I defrag my Guest OS?

  1. While generally I would agree, I would see issue in some cases when checking thick-provisioned file server volumes when you are seeing fragmentation on a given volume pushing 120+%. At least under Windows Server 2008/2003/200 there were increased memory loads and general performance issues caused from the fragmentation. These affects were seen across two datacenters with multiple 10gig links on the network side and 4-way or higher density 8GB FC SAN connectivity spread acrossed shared AMS2500 disk. In general if fragmentation is under 20%, there is no need for it in a VM environment….but when there is a high percentage of fragmentation does it still hold true?

  2. Hi Patrick, thanks for the comments. I’m sure there are always going to be corner cases. Defrag might help if you have a single thick-provisioned, non-snapshotted VM on local storage. But in any other configuration, especially for SAN & NAS datastores, I don’t see the benefit.

  3. Hi Chogan, I’m the Product Manager at a VMware Elite TAP partner and the makers of a VM Ready certified optimizer for ESX/i. As you noted, there the “side effects” that traditional (legacy) defrag can cause, and I appreciate your bringing attention to this. However, there are solutions on the market (not the Windows defragmenter though) that have technology that mitigates or entirely eliminates those undesirable effects. Those third party solutions exist and flourish as there are tangible benefits from reducing fragmentation, including on SANs. I can provide you this data. Please email me to discuss further.

  4. Hi Michael,
    If there is a product on the market that addresses the above issues with the defragmentation of a Guest OS, then I’d love to read about it.
    Will reach out shortly for references.

  5. Great post! This is something that I’ve often thought about too.
    I agree with you that most SAN/NAS filesystems are going to abstract the underlying disk layout and thus defragmentation would not be a beneficial operation (potentially detrimental in fact). Your point about performance is spot on too; you’d be really lucky to recoup the performance impact of running a defrag against a VM on a shared VMFS volume. However, I would argue that those using simple DAS or entry level SAN technology (like an HP P2000G3) would still benefit from a guest level defrag because there isn’t any “intelligence” going on in terms of how the controllers are writing data to the disks (excluding the use cases you brought up above too). Furthermore, I’m still concerned about the implications of a highly fragmented guest filesystem (even if the guest is completely ignorant of the underlying layout that actually resides on disk), for instance:
    - Would a highly fragmented (from the guest OS’s point of view) NTFS filesystem’s chkdsk speed be negatively affected?
    - Would the overall health of a Windows VM with a highly fragmented NTFS filesystem still be OK? Thinking about the affects on the MFT here too…

  6. Hi Ryan,
    Thanks for commenting. My goal with this blog post was to highlight some reasons why defrag would not be necessary in a virtual infrastructure as a whole (or perhaps even a bad idea). However there will always be corner cases, one of which you mentioned above. If the master file table (MFT) of an NTFS filesystem is also fragrmented (extreme case scenario), then there will be additional overhead incurred while the OS first retrieves the MFT entry before getting the NTFS data.
    A previous poster mentioned the fact that V-locity 3 from Diskeeper is virtualization feature aware and also does optimal file placement to prevent defrag in the first case. I’m going to check them out at VMworld 2011 in Copenhagen to get further details.

  7. So while I am in agreement with some of the assertions in this article, I feel the conclusion is not complete. One overriding rule taken from the experience I’ve had with vmware (primarily storage) is that the entire stack must be optimized for best results or it will cause stress on the next weak point elsewhere in the stack. Focusing on one part while ignoring another will not yield best results. I feel that was done in this write up by focusing on the lower layers (SAN, channel, and VMFS), but completely missed the guest layer!
    He is accurate in speaking that as a rule, running defrag against TP (thin provision), LC (linked clone), or auto-tiering, is a bad idea and should be avoided. However, in the case of systems that are designed at the outset to be a high IO/low latency NTFS filesystem, TP and LC wouldn’t be used, and auto-tiering hasn’t been around long enough to employ. Thus, we’ll assume in this conversation we’re using a plain-jane thick-provisioned FC disk on a shared VMFS filesystem.
    SAN technology abstracts physical disk from the server. This is well known and understood: the ESX doesn’t talk to the disks, it talks to the cache on the frame, thus a defrag operation (take block at location A and move to location B) doesn’t really “move” the block, as the cache deals with that, so defrag will not have any benefits at the SAN layer. Additionally, by its nature, vmware will always be pure random IO from the frame’s perspective, and defrag can’t gain us anything there either.
    Now the big part that the author failed to look at: how things are from the NTFS point of view in the guest OS. This is a HUGE consideration. Every file location on NTFS volume is tracked in the MFT (master file table). The MFT is a flat linear file and 1024 bytes is allocated per MFT entry that holds file attributes and extent data which describes each extent that a file sits on in the file system. An extent in this context is defined as a series of contiguous NTFS clusters (blocks). A contiguous file has one extent entry, essentially “Starting offset and length”. A fragmented file can have many extent entries. Additionally a heavily fragmented file may fill up the 1024 bytes for its MFT table entry and it would have to append a new MFT entry to continue with the extent descriptors. Remember I said it’s a linear table, thus it can only be placed at the end of the MFT. Now lets take the fragmentation to extreme and the MFT reserved space fills up? The system will start taking free space blocks and reserving IT for the MFT, now the MFT itself is fragmented. So instead of the guest OS issuing two reads (one for the MFT table entry and one for the actual data), it would have to do multiple reads just to get the MFT and then many more additional reads to read in each extent. Now multiply this times the quantity of systems chattering down the same FC channel to the VMFS and you quickly have performance degradation and the CIO calling you asking why his email is taking forever to load up. :) Granted, that one can likely mask this by implementing auto-tiering at the SAN or widening the IO channel, or setting up preferential shares/limits on VM IO access, at the end of the day the stack is not optimized and it will have to be addressed.
    In conclusion, while I feel that the author makes valid points that defrag is not necessary for the most general of scenarios, it doesn’t look at the entire ecosystem as a whole, and thusly is flawed. For maximum performance and efficiency to scale (up or out), I still advocate that defrag does have real and tangible benefits in the virtual environment and should be implemented on targeted systems where warranted based on the storage characteristics of those workloads.

  8. Hi Andrew,
    First, let me thank you for posting this comprehensive comment on the blog. I’m sure many readers will find this interesting.
    You make a very valid point, and it is something Ryan raised in a previous posting. There can be corner cases (I believe you use the term ‘extreme’) where massive fragmentation of the filesystem will lead to degraded performance. You are of course quite right, and I don’t argue that point with you. The primary purpose of this article was to highlight the concerns around doing defragmentation on Virtual Machine disks, and the negative impact it can have.
    But, as you have eloquently pointed out above, there could be some very extreme cases where defragementation will be necessary in order to get your VMs performing optimally. All I would say is that certain considerations would need to be taken into account if you are going to do defragementation in a virtual environment, and I’m hoping those are pointed out clearly in this article.

  9. Hi Chogan,
    Andrew’s comments about NTFS are the essence of this problem. The work is being done in the guest and NTFS behavior effects everything downstream including VMware and the disks.
    Our company is a VMware Elite TAP member and the developers of a Windows guest optimizer. Last year we worked with Scott Drumonds from VMware’s performance engineering group to quantify the benefits of defragmenting guest servers. We used VMware’s vscsiStats utility to collect the data and there were several metrics that were very interesting.
    1. Total I/O across the stack was reduced by 28%.
    2. A 12x increase in the largest I/O transfers
    3. A 50% reduction in I/O taking more than 30ms to complete
    4. Sequential I/O increased by over 50%
    5. System throughput increased 28%.
    Fewer and larger I/O produce fewer SCSI commands across the stack. This in turn reduces physical I/O to the disk with a positive effect on disk latency and throughput.
    Increasingly vendors are including features that work around and/or accommodate VMware. Allowing or disallowing defrag based on VM drive type and setting optimization strategy is one example. There are also strategies for working with thin-provisioning.

  10. Hi Bob, thanks for the comment.
    If these results are published in a white paper, please share the link. I’d really like to see the type of back-end storage (SAN, NAS, local), the improvement it made to random I/Os, and how your product avoids inflating VM features like thin provisioning and snapshots (which was the primary reason for the original article). Thanks.

  11. Hi there,
    I currently have a VM with windows 2k3 running on my esx host and the data storage is a SAN equalogic PS4000.
    The fragmented files on the data partition in the guest OS is over 80% and I can only use the 20% left.
    How can I work around this issue?
    Thanks

  12. Hello Andy,
    I’m not 100% sure what the question is that you are asking, but if you are looking for mechanisms to defrag the Guest OS, then you can use the basic in-guest defragrmantation tools, or if you are looking for more comprehensive products, I suggest you check out products from vendors like Diskeeper & Raxco.
    Keep in mind the effects defrag’ing can have on the virtual machine however, as described in the post.
    Hope this helps.
    Cormac

  13. Has anybody tried making NTFS filesystems with 64K clustersize
    on the guest instead of the default 4K size?
    –ghg

  14. it’s very useful to me because what i looking for i get at the conner of thi page so i very glad that impossiable thinks are also

  15. I am trying to reconcile the storage amount that my Guest OS thinks it is using and what vSpeher reports.

    There is a large descripancy.
    Guest OS thinks its is using 3.8GB (sum of all used space reported by df)

    For this guest in vSphere terminology:
    Provisioned Space = 35GB
    Used Space: 11.78GB

    so where is that extra ~8GB?

  16. Hi, thank you for this! It has been 1.5 years since your original blog and there were lots of comments about forthcoming guest-OS defraggers; do you have any new data to share in this regard?

    Since I just did this, I was wondering about the effects of a VM residing on an SSD. From the guest OS perspective, it is a hard drive, and you aren’t supposed to defrag an SSD. IMO, it would be beneficial for VMware to pass along this drive fact to the guest OS, if it isn’t already; the Windows defragger appears to be quite happy to run, although I’m too chicken to actually let it do so :-)

    • The original post was aimed primarily at Windows defrag. Since then, products from our partners (such as Condusiv & Raxco) have been updated to address many of the concerns outlined in this post.

      On the SSD front, we’re working on some features that will make vSphere very much aware of SSD, but I can’t say much more about that at the moment. I hope to be able to talk about it more in the near future.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>