posted

0 Comments

A post today from guest blogger Michael Webster, VCDX #66, vExpert 2012 and author of longwhiteclouds.com, specialist in virtualizing business critical apps, and owner of IT Solutions 2000 Ltd. Michael explores a not so well known feature of vSphere 5.

Michael Webster

One of the features many people may not be aware of that was released in vSphere 5 is Multiple-NIC vMotion. This is a feature that allows you to load balance a single or multiple vMotion transmissions over multiple physical NIC’s. This is of significant benefit when you’ve got VM’s and hosts with large amounts of memory, as vMotion migrations will complete significantly faster. So your Business Critical Applications with large amount of memory and CPU’s can now migrate without disruption even faster. Below I’ll briefly cover the good and great of this technology and also a gotcha that you need to be aware of.

The Good

I thought we’d start with the good news. With vSphere 5 you can now split single or multiple vMotion streams over multiple NIC’s. Up to 4 x 10Gb/s NIC’s or 16 x 1Gb/s NIC’s are supported. This magnifies even further the already impressive 30% improvement in vMotion performance vs vSphere 4.1.

The Great

It is super easy to set up multi-NIC vMotion. It’s all explained in KB: Multiple-NIC vMotion in vSphere 5 (2007467). To briefly cover the set up.

  1. You set up multiple vmkernel port groups, each with a different NIC as primary, any other NIC’s as standby or unused, and a different IP address on the same subnet.
  2. You then select the vMotion tick box on the vmkernel port.

That’s it!

Very simple. Now single vMotion’s and multiple concurrent vMotions will be load balanced over the NIC’s. There is absolutely no need to configure any complicated LACP or IP Hash load balancing to make this work, there is no need to use Load Based Teaming (Route based on physical NIC load). You can use this with standard switches, no need for distributed switch. It doesn’t even require Enterprise Plus licenses, but as the benefits are mostly with VM’s and hosts with lots of RAM you’re probably going to have Enterprise Plus anyway.

I tested performance of Multi-NIC vMotion with 2 x 10Gb/s NIC’s in my home lab and got almost 18Gb/s when using Jumbo Frames on vSphere 5. Hosts go into maintenance mode so fast you better not blink! I haven’t retested Multi-NIC vMotion again since upgrading to vSphere 5 U1 and the latest patches. I plan to test it when Update 2 or the next vSphere release comes out.

The Gotcha

There is a condition that may occur during long running vMotion operations that could cause all hosts ports configured for vMotion to be flooded with the vMotion traffic. The way I understand it this occurs when physical switches MAC tables start timing out the MAC’s (before the ARP timeout). The reason it occurs is because although the outbound traffic is split over multiple vmkernel ports and multiple NIC’s the ACK’s coming back from one MAC. So after a while the physical network may time out the other MAC’s as it’s not seeing any traffic from them. As the transmissions are still occurring the switches may start flooding every port that is configured for the vMotion VLAN. Because the problem is generated by MAC timeouts around the 5 minute mark you will be more likely to experience this problem with 1G vMotion NIC’s or with 10G vMotion NIC’s that have Network IO Control or QoS limits imposed, as your migrations will generally take longer.

To work around this problem, you may be able to adjust the MAC timeout values on your switches, depending on the type of switches you’ve got. The default MAC timeout on Cisco switches is normally 5 minutes. On the Dell 8024 10G Base T switch I’ve got in my lab the Address Aging value defaults to 301 seconds and is adjustable. Be careful if you choose to adjust these values as there may be other consequences; any adjustments should be tested, and only applied to the switches connecting directly to your vSphere Hosts carrying the vMotion VLAN.

VMware is aware of this problem and is working on a fix. The fix didn’t make it into ESXi 5 Patch 03 that was released on 13/07/2012 (07/12/2012 for those in the USA). I hope that it makes it into the next vSphere 5 update release. I will update this article when the problem is fixed and let you know what patch or updates you need to apply. Until then I hope you are able to make use of Multi-NIC vMotion by applying the above workaround. At least configure it in your test environments and see how it goes.

Final Word

If you thought vMotion in vSphere 5 was already fast you ain’t seen nothing yet, till you’ve experienced Multi-NIC vMotion. Even with this slight gotcha it still has some benefits if you can apply the workaround in your environment. Especially with very large VM’s >96GB RAM, and large hosts >256GB RAM, it will significantly help your migration times.