I often get asked for advice on tuning the DRS migration threshold, and my standard answer is to stick with the default. However, I can appreciate that when you provide a nice looking slide bar with five possible settings there is a natural tendency to want to change things, and trying to convince people to resist this urge requires a bit more than a simple “stick with the default”. So let me explain why, in most situations, keeping the default migration really makes sense.
1. The DRS algorithms are complex and even seemingly minor changes can have far reaching side affects should resource contention develop. VMware engineers put a lot of thought into the default values and what better advice can I offer than to go with what the experts recommend?
2. Many times I find that admins want to tune/change the migration threshold right out of the box without a clear goal in mind. While tuning in an effort to improve things or fix a problem makes sense, tuning just for the sake of tuning can create problems.
3. It’s not uncommon to find that a perceived DRS issue that is driving the desire to tune/change the migration threshold is actually the result of something outside of DRS, in which case tuning the migration threshold can be counter productive.
For example, I recently talked to an admin who was concerned because following an upgrade to ESXi 4.1 DRS didn’t seem to be doing as good a job balancing memory. In discussing his concern I discovered that DRS was fine and the problem was confusion in how to interpret the memory statistics displayed in VCenter. When the memory statistics were put in the proper context the customer could see that DRS was doing a great job at balancing memory. The admin commented that he had been playing with different migration threshold settings for several weeks in an effort to “fix” DRS, which is unfortunate considering there was never a problem.
Of course this naturally begs the question “When should I change the default migration threshold”? The decision to change the migration threshold should be based on the number of DRS induced migrations occurring in your cluster. If you feel DRS is doing too many migrations, and putting extra overhead on your hosts, you could move the migration threshold to a more conservative setting. On the other hand, if you aren’t seeing much DRS activity in your cluster and feel you could benefit from more migrations you might choose to move to a higher, more aggressive setting.
So in summary, always start with the default migration threshold. Monitor the number of DRS invoked migrations over a period of time. If you decide, based on the number of DRS invoked migrations (or lack thereof), that it makes sense to change the default, adjust things slowly over time and see how it works. Just be careful to ensure that you have a good handle on the “issue” you are addressing and a clear goal for what the expected impact of the change should be.