Home > Blogs > VMware vSphere Blog > Tag Archives: Storage DRS

Tag Archives: Storage DRS

Attention Storage DRS & Storage I/O Control Users…

…please help us on the future direction of these products.

The new Product Manager for Storage DRS and Storage I/O Control has asked me to reach out the the community and ask for your input on what we should do next in these product areas.

I know many of you actively use these technologies, so you are certainly the best folks to highlight what works well, what doesn’t work well and what additional features you would like to see added.

Please take the quick survey on Storage DRS & Storage I/O Control by clicking here.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @VMwareStorage

 

vSphere 5.1 New Storage Features

vSphere 5.1 is upon us. The following is a list of the major storage enhancements introduced with the vSphere 5.1 release.

VMFS File Sharing Limits

In previous versions of vSphere, the maximum number of hosts which could share a read-only file on a VMFS volume was 8. The primary use case for multiple hosts sharing read-only files is of course linked clones, where linked clones located on separate hosts all shared the same base disk image. In vSphere 5.1, with the introduction of a new locking mechanism, the number of hosts which can share a read-only file on a VMFS volume has been increased to 32. This makes VMFS as scalable as NFS for VDI deployments & vCloud Director deployments which use linked clones.

Space Efficient Sparse Virtual Disks

A new Space Efficient Sparse Virtual Disk aims to address certain limitations with Virtual Disks. The first of these is the ability to reclaim stale or stranded data in the Guest OS filesystem/database. SE Sparse Disks introduces an automated mechanism for reclaiming stranded space. The other feature is a dynamic block allocation unit size. SE Sparse disks have a new configurable block allocation size which can be tuned to the recommendations of the storage arrays vendor, or indeed the applications running inside of the Guest OS. VMware View is the only product that will use the new SE Sparse Disk in vSphere 5.1.

Continue reading

Virtualized vCenter in a Datastore Cluster

By Frank Denneman – Senior Technical Marketing Architect

To virtualize vCenter or not, that has been an age-old question. To control the location of the vCenter virtual machine, in case the hosting vSphere server crashes, a couple of best practices where developed. Such as; Disable DRS, create host-vm affinity groups or my favorite, leave all resource settings default and just document which datastore the virtual machine is stored.

Restrict Storage Load Balancing operations?
However due to the introduction of Storage DRS and its load balancing mechanisms, this recommendation requires some extra configuration to still use it. In my opinion DRS should not be limited in its load-balancing options, as disabling a specific virtual machine can effect other virtual machines as well. This focus purely concentrates on the compute level, but how should we threat the disk structure of vCenter? Should we restrict load-balancing operation of Storage DRS?

I think it depends on the configuration of the compute layer, if you do not restrict the movement of the VM at the compute cluster layer, its recommended to restrict the movement of virtual machine files. By disabling the automation level of the virtual machine in the datastore cluster, Storage DRS shall not move the virtual machine. Please be aware that datastore maintenance mode will fail until the virtual machine is migrated manually out of the datastore.

To ensure best performance, the vCenter virtual machine should receive more disk shares than other virtual machines.

 Increase disk shares:

  1. Select the virtual machine and go to edit settings.
  2. Select the Resource Tab and select the option Disk.
  3. Select one of the three predefined Share levels or configure custom shares.

Disable migration:

  1. Select the datastores and Datastore Cluster view.
  2. Select the Datastore Cluster.
  3. Go to Edit Datastore Cluster.
  4. Select the Virtual Machine Setting menu option.
  5. Select the Manual or Disabled Automation level.

 

VMDK affinity rule
The virtual machine setting window also allows you to configure the VMDK affinity rule setting of the virtual machine. By default a VMDK affinity rule is applied to each virtual machine and this rule forces Storage DRS to place all disks and files from a VM to be stored on a single datastore.  I am a strong opponent of using a VMDK anti-affinity rule for all virtual machines in a datastore cluster as it allows Storage DRS more granularity when it comes to load balancing. But for this particular scenario both configurations have merit.

The datastore where the working directory with the VMX file is placed is deemed the registered datastore. If an anti-affinity rule is applied to the VM, all VMDKs and the working directory are placed on a separate datastore. When Storage DRS needs to load balance it is very (extremely) rare that Storage DRS will move the working directory of a virtual machine. A working directory is very small compared  to a VMDK and generates under normal operations very little IO. When will it move a working directory? When the VM swap file is very big, nearing or exceeding footprints of other VMDK files and/or when the swap file is generating a lot of I/O. Take this behavior into account when not selecting a manual or disabled automation level!

Using an affinity rule reduces complexity when it comes to troubleshooting. All files are on the registered datastore.

So next time when you set up and configure a virtualized vCenter, do not only think about DRS settings, think about the Storage DRS settings as well.

 

 

What could be writing to a VMFS when no Virtual Machines are running?

[Updated with vSphere HA clarifications]

This was an interesting question that came my way recently. One of our storage partners wanted to ensure that a VMFS volume was completely quiesced (no activity) and was interested to know what could possibly be the cause of writes to the VMFS volume when all Virtual Machines were powered off.

There are quite a few vSphere features which could be updating a volume, and after a bit of research, I decided it might be a good idea to share the list with you.

  1. If you have a Distributed Virtual Switch in your virtual infrastructure, changes to the network configuration would result in updates to the .dvsdata configuration file which sits on a VMFS volume. 
  2. If you have implemented a vSphere HA cluster, then there may be updates going to vSphere HA 5.0 heartbeat datastore and related files. First, what are these heartbeat datastores used for? Well, to have some control over the HA cluster in the event of a network failure when nodes can no longer communicate over the network, vSphere HA introduced heartbeat datastores. Through the use of these HB datastores & special files on other datastores, a master can determine which slave hosts are still alive, and also determine if there has been a network partition rather than network isolation (there will be different behaviour depending on which). Note that we don't write to the HB file; it is opened so that the "metadata HB" on the VMFS volume is updated. Other vSphere HA files, which reside in special folders on all datastores in the cluster, are also written to.
  3. Another possibility, of course, is that writes are coming from the VMFS metadata heartbeat updates. These are essentially pulses from an ESXi host to inform other hosts (which might be looking to update a file) that this host still has a lock on the file in question.
  4. An ESXi host can be deployed with a designated scratch partition or the scratch partition could be placed as a folder on a VMFS datastore if no suitable partition exists. If an ESXi scratch partition has been located on a VMFS datastore, then it may be that the scratch partition is being regularly updated with host information (e.g. tmp files, log updates, etc). This could be the source of spurious writes to the VMFS.
  5. Storage I/O Control could be enabled on the datastore. If this is the case, each host that uses the datastore writes metrics to special files on the datastore. These files are used to determine the datastore wide latency value across all hosts to the datastore. If this exceeds the defined latency value (default 30ms), this is an indicator to SIOC to start throttling. The last update I've seen on this suggests that these files are updated by all hosts every 4 seconds.
  6. Finally, the VMFS volume could be part of a Storage DRS datastore cluster. If load balancing based on I/O metrics are enabled, then Storage DRS may be using Storage I/O Control to measure the datastore latency values as mentioned in number 5.

So as you can see, simply shutting down VMs on a datastore is not enough to ensure that they are quiesced. A number of other vSphere features could be writing to the datastore (I may have even missed some in this list).

If you need a datastore to be completely quiesced for whatever reason, I'd recommend using esxtop to ensure that there is no I/O activity after you have shut down your VMs.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter @VMwareStorage

Storage DRS and Storage Array Feature Interoperability

We've had a number of queries recently about how Storage DRS works with certain array based features. The purpose of this post is to try to clarify how Storage DRS will behave when some of these features are enable on the array.

The first thing to keep in mind is that Storage DRS is not going to recommend a Storage vMotion unless something is wrong on the datastore; either it is running out of space, or its performance is degrading.

Let's now look at the interoperability:

1. Thin Provisioned LUNs

If the array presents a Thin Provisioned LUN of 2TB which is backed by only 300GB physical, is Storage DRS aware of this when we make migration decisions? In other words, could we fill up a Thin Provisioned datastore if we choose it as a destination for a Storage vMotion operation, and it is already quite full?

Although Storage DRS is not aware that the LUN is Thin Provisioned, it still should not fill it up. The reason why is that in vSphere 5.0, a new set of VAAI features for Thin Provisioning were introduced. One of these features was to surface an alarm in vCenter when a Thin Provisioned datastore became 75% full on the back-end. If a datastore has this alarm surfaced, then Storage DRS will no longer consider it as a destination for Storage vMotion operations. This should prevent a Storage vMotion operation from ever filling up a Thin Provisioned datastore. In this case, if the 2TB Thin Provisioned datastore has 225GB of its 300GB already used, the alarm would be surfaced and Storage DRS would not consider placing any additional VMs on it.

2. Deduplication & Compression

Many storage arrays use deduplication & compression as a space efficiency mechanism.  Storage DRS is not dedupe aware, but this shouldn't be a cause for concern. For instance, if a VM is heavily deduped, and Storage DRS recommends it for migration, Storage DRS does not know that the VM is deduped. Therfore the amount of space reclaimed from the source datastore will not be the full size of the VM. Also, when the VM is moved to the destination datastore, the VM will have to be inflated to full size. Later on, when the dedupe process runs (in many cases, this doesn’t run in real-time), the array might be able to reclaim some space from dedupe, but it will be temporarily inflated to full size first.

But is this really a concern? Let's take the example of a VM that is 40GB in size, but thanks to dedupe is only consuming 15GB of data on disk. Now when SDRS makes a decision to move this VM, it will find a datastore that can take 40GB (the inflated size of the VM). So that's not too much of an issue. What about the fact that SDRS is only going to gain 15GB of free space on the source datastore as opposed to the 40GB that it thought it was going to get? Well, that's not a concern either because if this datastore is still exceeding the space usage threshold after the VM is migrated, SDRS will migrate another VM from the datastore on the next run, and so on until the datastore space usage is below the threshold. so yes, it may take a few more iterations to handle dedupe datastores, but it will still work just fine.

And yes, it would be nice if Storage DRS understood that datastores were deduped/compressed, and this is something we are looking at going forward.

3. Tiered Storage

The issue here is that the Storage I/O Control (SIOC) injector (the utility which profiles the capabilities of the datastores for Storage DRS) might not understand the capabilities of tiered storage, i.e. if the injector hits the SSD tier, it might conclude that this is a very high performance datastore, but if it hits the SATA tier, it might conclude that this is a lower performance datastore. At this point in time, we are recommending that SDRS be used for initial placement of VMs and load balancing of VMs based on space usage only, and that the I/O metrics feature is disabled. We are looking into ways of determining the profile of a LUN built on tiered storage going forward, and allowing I/O metrics to be enabled.

I hope this gives you some appreciation of how Storage DRS can happily co-exist with various storage array features, and how in many ways the technologies are complimentary. While we would agree that some of the behaviour is sub-optimal, and it would be better if Storage DRS was aware of these array based features in its decision process, there is nothing that prevents Storage DRS working with these features. Going forward, we do hope to add even more intelligence to Storage DRS so that it can understand these features, and include them in its decision making algorithms.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter @VMwareStorage

Storage DRS Affinity & Anti-Affinity Rules

By now you should be well aware that one of the major storage and resource management enhancements in vSphere 5.0 is Storage DRS. What was one of the motivations behind developing this feature? For some time we have had the Distributed Resource Scheduler (DRS) feature in vSphere, and this managed the initial placement and load balancing of virtual machines based on CPU and Memory utilization. However there was still the possibility that VMs could be placed on the same datastore, and even if that datastore was nearing capacity, or the VM performance was degrading, there was nothing in DRS that would prevent VMs being placed on this datastore. Storage DRS addresses this by selecting the best datastore for initial placement, and also uses Storage vMotion to migrate virtual machines between datastores when capacity or I/O latency is an issue.

In previous postings I already discussed initial placement and load balancing based on datastore capacity and I/O latency. However there is another cool feature of Storage DRS that I haven't yet discussed. These are the affinity and anti-affinity rules. These rules are conceptually very similar to the affinity and anti-affinity rules that you might find in DRS. The rules basically work by keeping VMs together on the same datastore or apart of different datastores, in much the same way that the rules in DRS kept VMs together on the same host or apart on separate hosts. In DRS, you might have separated out your primary and secondary DNS server using anti-affinity rules. In this way, if one ESX host failed & brought down one of the DNS servers, your DNS server stays running on another host in the cluster. However there was nothing to stop both the primary and secondary DNS servers residing on the same datastore, and if that datastore failed, so did both servers. Now with Storage DRS anti-affinity rules, you can keep these DNS servers (or any other primary/secondary servers) on different datastores.

However there is another significant feature of Storage DRS affinity & anti-affinity rules, and this is the ability to automatically keep Virtual Machine Disks (VMDKs) together on the same datastore or apart on different datastores. By default, VMDKs are placed together on the same datastore. So why might I want to place VMDKs on different datastores? Well, one example that I thought of was that some of our customers use in-Guest mirroring and raid volumes in the Guest OS. In this case, you would want to make sure that both the primary volume and its replica are kept on different datastores. If both sides of the mirror were on the same datastore, and that datastore failed, you would lose both sides of the mirror.

Rules


This is yet another reason why Storage DRS is one of the most highly regarded features in vSphere 5.0.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter @VMwareStorage

Storage vMotion, Storage DRS and vSphere Replication Interoperability – Revisited

Last week I posted a blog article which described the interoperability status between VMware's Site Recovery Manager 5.0 and Storage DRS. In this article, I described how Storage DRS could bring about additional administrative steps to keep the recovery plans up to date, and open a small window where Virtual Machines may not be protected. For that reason, VMware is not supporting these features/products being used together. We did not want to risk a situation where a failover to the DR site did not complete successfully.

In the same article I also mentioned that VMware would not be supporting Storage vMotion/Storage DRS with vSphere Replication. vSphere Replication (VR) is a new feature of SRM 5.0, and allows Virtual Machines be protected across different sites without the need for storage array replication. The support statement is still true – VMware will not be supporting Storage vMotion/Storage DRS with VR. However, the reason I gave for the lack of support was not accurate. Thanks to some feedback internally, I can give the correct reason here.

First, contrary to what I said in my initial blog post, since VR operates at the level of virtual disks, VR does not have a dependency on the datastore that a VM's disk is stored on. But there are other considerations when using these technologies together. Let's now look at the primary and replica VR sites, and how Storage vMotion/Storage DRS can impact each.

At the primary site, because of the way VR works, there are two separate cases of Storage VMotion/Storage DRS support to consider:

  1. Moving some subset of the VM's disks
  2. Moving the VM's home directory.

In the case where some subset of the VM's disks are migrated with Storage VMotion/SDRS, things should work. From VR's point of view, the Storage VMotion operation looks like a "fast suspend/resume" and VR handles that fine.

The problem at the primary site comes from the second case: when doing a Storage VMotion of a VM's home directory.  In this case, the VR persistent state files (".psf") are not migrated, but are deleted.  From the point-of-view of VR, this looks like a power-off, followed by a power-on of the VM without the ".psf" files.  This triggers a VR "full sync" (the disk contents are read and checksummed on each side).  Since the primary and replica are basically in sync, very little data should actually be transferred.  While this is relatively expensive (the entire disk is read and checksummed, at both sites), there is no correctness problem.  Note that this expense could be high enough (depending on the size of the disks) that VR may miss an RPO (Recovery Point Objective) window or two. This means that VMs could be more out of date when recovered than originally planned.

At the replica site, the interaction is less complicated.  Storage DRS cannot see the replica disks. These are just "disks"; there is no "VM". The VR disks are not actually attached until test-bubble or fail-over time. Therefore Storage DRS cannot move these disks since it can only see VMs.  This means there are no low-level interop problems, but there is a high-level one as we would all like Storage DRS to see the replica disks and be able to move them out of the way if a datastore is filling up at the replica site.

Overall the interoperability isn't great in this initial release.  At the primary site there are no serious correctness problems, just the performance impact of a VR "full sync" that can be triggered by Storage VMotion. At the replica site there are no immediate problems, but the higher-level lack of visibility is an issue.

 Once again, as I stated in the previous blog, VMware are tracking full interoperability of Storage vMotion & Storage DRS with Site Recovery Manager & vSphere Replicator as a high priority item.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter VMwareStorage

vSphere Storage Appliance 1.0 & Site Recovery Manager (SRM) 5.0 Interoperability

A short posting this time, but something that is coming up a lot at the moment. The question is "Can I protect VMs that reside on vSphere Storage Appliance datastores with Site Recovery Manager 5.0?".

The answer is yes you can. Virtual machines that reside on the vSphere Storage Appliance (VSA) can be protected by SRM 5.0 using vSphere Replication (VR). VSA does not require a Storage Replication Adapter (SRA) to work with SRM 5.0. This statement will appear shortly in the SRM 5.0 Release Notes. For more about SRM 5.0 & the new vSphere Replication feature, have a read of this blog article.

Here is a screen shot of me successfully protecting a VM using vSphere Replication. This resides on an NFS datastore in a VSA cluster. In this example, it is being replicated to a local datastore on an ESXi 5.0 server at a remote site:

Vr-2

This screenshot shows the initial synchronization of the VM, including the datastore details. VSADs-1 is an NFS datastore in the VSA Cluster:

Vr-1
It works just fine and is fully supported. There is one consideration however, and that is the vSphere licenses. While we are aiming the VSA at our small to medium businesses, and doing some special offers with Essentials+ bundling the VSA, the lowest license tier which supports SRM is Standard. Just keep that in mind if you are planning a solution using the VSA 1.0 & SRM 5.0 products.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter VMwareStorage

VMware VSA Support

Virtual machines that reside on the vSphere Storage Appliance (VSA) can be

protected by SRM 5.0 using vSphere Replication (VR).

VSA does not require a Storage Replication Adapter (SRA) to work with SRM

5.0