Home > Blogs > VMware Support Insider


Purpose-built guest OS datastores – Don’t do it!

Today we have a guest post from Duncan Armstrong, who is a Tech Support Engineer in our Canadian office who specializes in Fault and Storage issues. He’s currently focused on writing Knowledgebase articles as we want to capture as much real-world troubleshooting know-how as possible and share it with everyone. Today, he wants to give us some advice.

Since joining VMware Technical Support several years ago, I have frequently found myself pondering the decision making processes behind some of the various storage configurations that have been employed in VMware Infrastructure/vSphere environments. One particular configuration came up more often than I liked, and while unfortunate, it was hardly surprising to see these deployments/configurations result in unplanned outages or even impact recovery efforts. The version of ESX didn’t matter, nor did the type of storage used (NAS/SAN, iSCSI or FC). This issue did not discriminate any particular technology or branding. It seemed to punish those who preferred to organize their datastores in such a fashion.

I’ll get to the point: Dedicated Guest Operating System datastores are not a good idea, no matter how organized they may appear to be. The same sentiment can sometimes be shared with dedicated Data Disk datastores.

 

What? Why are these used?

I have the impression these “split” configurations—dedicated Guest OS datastores and Data Disk datastores—were created more for virtual machine file organization, than for performance or even space optimization. Customers with these setups do know very well where their guests and configuration files are (Guest OS Datastore), and where all their data drives are (their remaining Data Datastores). However deployments or consultations that push for this kind of setup are really quite wrong or misinformed.

There are a several ways these configurations can result in problems.

Event and time-based load spikes

After an outage or problem, the upswing of bringing your guests back online is going to take a longer amount of time; all of the reads and writes required for the mass startups are isolated to the few Guest OS Datastores that reside on your storage. Expect load to be maximized until all the guests have completed their startup sequences.

Once these operating systems are up, your guest system drive loads will start to drop, and likely begin to rise from the production data disks. The latter is really quite fine, but the former may translate into extended startup times following an outage.

System disks aside, having too many production-load guest Data disks on a single datastore would likely result in similar performance problems during production hours and especially backup schedules. Or you may see a consistent performance decline as a result of overloading a given disk set throughout your uptime.

I also see some customers dedicate an entire datastore for single virtual machine disks, configuring for a one Datastore to one Guest Data Disk ratio setup. This is also suboptimal and wastes space; delta files are not stored on these locations. The space left on each Data Disk Datastore is never utilized. That may not necessarily be a problem if you planned for it, but you’re also unnecessarily consuming more LUN ID numbers (if you are using VMFS or SAN storage). ESX supports up to 256; check VMware documentation, or specifically the Maximums and Limitations document for more information: Configuration Maximums for VMware vSphere 4.1.

Note: Thin-provisioned disks in vSphere are an exception; they start small and grow over time.

Over-commitment of ESX Server memory results in the utilization of existing Virtual Machine Swap (.vswp) files, which by default reside in the same directory as your virtual machines’ configuration files/directories. Having all your virtual machines across multiple hosts require heavy read/write activity on a common LUN won’t likely be a good experience.

Unexpected space consumption

Like the last point above, you may or may not already expect Virtual Machine Swap (.vswp) files to be created, consuming additional Guest OS Datastore space after virtual machine power-on (1:1 space required to virtual machine memory allocation. An 8GB RAM virtual machine uses an 8GB .vswp file unless if you set up memory reservations). Powering on 20+ Guest OSes with 4GB of memory each on the same datastore results in 80GB of virtual machine swap files being created.

The real killer in these setups is when snapshotting a virtual machine. Each virtual machine disk’s delta file (redo log) is created in the same directory as the virtual machine’s configuration file/directory, much like the .vswp file, above. A titular "OS Datastore" in these storage configurations, by role, effectively contains all of the delta files at once. This quickly becomes a problem, where such a datastore is usually fairly small to begin with (to save room on the SAN/NAS for "Data," of course).

With the actual data virtual disks sitting quite large, and commonly showing demanding performance and space consumption characteristics (the latter is due to increased additional delta/redo logging), they often cause the complete exhaustion of storage space on the "Guest OS Datastore," where the deltas are created and grow, to accommodate for disk changes. Of particular note are delta disks for databases or other "volatile" virtual machines where a lot of delta in a short time frame is going to be created. Datastore space exhaustion results in an outage for every virtual machine that continues to require additional space on the affected datastore. They cannot resume until there is additional space.

Space exhaustion while on snapshots is a common call-generator when this setup is used. Addressing these issues is somewhat trivial for VMware, but is largely demanding on expertise, familiarity with how snapshots function, and really, just thinking on your feet.

Conclusion

Please avoid this storage configuration. If you already have one like this, think about how you can start spreading out your operating system disks, even just a bit more. Mixing up the OS and Data disks isn’t inherently a Bad Thing, generally speaking, but there are always exceptions to the norm out there. So understand what you’re storing, their usage characteristics, and spread out your loads (even across a given day).

Further, by having some room to spare on each of your datastores (nothing is too tight and nothing is over-utilized now, right?), you have some breathing room later if, for any reason, you run into problems. Something I like to be able to anticipate is a mass exodus of all virtual machines from a datastore to my remaining ones. Do I have capacity for this? Or how about: Can I clone the largest virtual machine I have and put it on one or more destination datastores, in the event of an emergency?

I hope that helps a bit, or at least gives you some ideas. Or even more ideally, comfort you in the decisions you have already made to avoid this kind of setup. Don’t get caught setting something like this up. Yes, it will work fine if you avoid using features like snapshots (and accept the performance quirks/behaviors), but it could be done better, and with less effort.

5 thoughts on “Purpose-built guest OS datastores – Don’t do it!

  1. Burak Uysal

    I have done it! and it works pretty well in my environment. I have three tier datastores. Tier1: SAS RAID10, Tier2: SAS RAID5, Tier3: SATA RAID6.
    Any VM requires > 1TB have its RDM assigned. I use Tier2 for OS disks. Total SAN capacity is 200TB mirrored. We have a strict snapshot policy to remove any snapshots older than 3 days controlled by a powercli script. Remember: snapshots are not backups. OS disks are spread over multiple Tier 2 LUNs and RAID Groups. Oracle & SQL VMs are on Tier 1 datastores.
    For large file systems it would be nice to define the snapshot disk using vSphere client.
    I am very concerned with keeping OS disks on SATA stores. This would cause performance problems and some data packet drops due to high latency. With the 2TB sata disks IOPS requirement is the decision maker not the capacity.
    My 2p.

  2. Collin C. MacMillan, VCP4 / vExpert 2010

    Want a great example of this datastore paradigm? VMware View and “user data disks.”
    I find the argument about VMware-based snapshots both valid and the primary reason for concern. In practice, the allocation of snapshots for ALL datastores to the configuration datastore seems counter intuitive. In fact, this choice at once defies most storage tiering methods and couples I/O of the highest performance storage tier to that of the lowest performance tier for the duration of the snapshot.
    However, when this “default” location for snapshots doesn’t match the use case, KB1002929 has a solution. Poweshell likewise allows for VM snapshot delta file locations to be more arbitrary.

  3. Collin C. MacMillan, VCP4 / vExpert 2010

    Oh, yes. It would be worth VMware’s consideration to CHANGE the policy from vCenter to allow snapshot DELTA files to live on the same datastore as the VMDK or RDM link file. This would answer a myriad of performance-based allocation issues. It also seems to make a lot more sense (keeping snapshot deltas in the same filesystem as the associated vmdk.)

  4. Duncan Armstrong

    Excellent points. It’s definitely in consideration right now, as tiered-storage applications become more common-place.

  5. What Is Cloud

    Duncan, i will completely agree that an environment with limited governance would run into unique complications on many different levels. I would like to purpose that separate OS and data datastores will enable functions that would not be available with other configurations. I think this is a valid storage configuration depending on the goals of the storage for virtualization standards design.
    However, If a goal is to enable SRM, use NFS, or build storage tiers – don’t get me started here :-) – then separate OS and Data stores will significantly complicate the management aspects and would impact replication and recovery.
    There are so many different considerations for storage designs that i think taking such a strong stance limits flexibility and options. How do you size and scale storage? When do you purchase more storage? Does the OS swap space need the same performance as the high end transactional DB? How many VM’s can my storage hold? (data space will always be a variable, but OS should be tightly governed)
    If there is a failure and all VM’s need to be restarted at the same time I’m inclined to be more concerned about CPUread CPUwait than I would be of the Disk IOPS. (DAVG/cmd)

Comments are closed.