VMware

Site Recovery Manager 1.0 Update 1 Available | Main | Would you like to learn more about using the SRM Test failover feature?

01/15/2009

How to exploit the test bubble for all its worth

VMware’s Site Recovery Manager product has a number of interesting features that enable a whole new way to look at disaster recovery and business continuity.  One that we are going to look at today is the test failover, and more specifically at the network possibilities around the test failover.

It is important during a test failover to test your virtual machines without them being visible to the production systems.  Things like IP and name conflicts can happen if you are not careful and they will ruin your day.  So isolating those VM’s is important but we need to do that in a way that we can still properly test.

If you have only one ESX server you can use the automatically generated test bubble network that connects the VM’s together.  This private virtual switch will allow VM’s to talk to each other without the traffic leaving that switch.  So we are preserving the important isolation.  This is the default for a recovery plan.  It is shown as Auto in the Test Network column when you edit the Network part of a recovery plan.  You can see this in the screen shot below.  You can learn more about this on page 62 of the Admin Guide – URL for it is below.

When you have multiple hosts, and you need VM’s on each of the hosts to communicate with each other we cannot use the private virtual switch method as it has too much isolation (it does not span hosts).   The solution to that is to use an isolated VLAN.  This VLAN will be connected to a virtual switch that can be specified in the same place that Auto is.  You can see this in the figure below.  In this example when a test failover occurs, and a VM is connected normally to the CorporateLAN virtual switch, it will be connected to the CRM Test Network virtual switch during the failover.    With the isolated VLAN connected to that virtual switch on each host, your VM’s will now all be able to communicate.

While we may now have a connection among the virtual machines, it may not be completely usable.  You may need to have services such as AD domain controllers, or even things like DNS / DHCP that today’s corporate networks cannot live without.  According to best practices these services exist at the recovery site, so in a real failover they will be there for you to use.  But in this test failover they are not available so you need to include them on the isolated VLAN.  This is not hard to do when everything is virtual because we support manual or scripted hot cloning, but if your infrastructure services are not yet virtualized you can use Converter as a great way to move your physical servers into the virtualized world.  You can clone your AD domain controllers and your DNS / DHCP machines and put them on that isolated VLAN. While it is a little more work, I would recommend deleting these clones after the end of your test failover and creating them new when you need them again.  If you are good with AD / DHCP / DNS than you can just return them to the corporate network by changing their virtual switch.  It is important to keep these infrastructure machines current so they need to communicate outside the isolated VLAN at times.  Current means things like passwords, DNS names and IP addresses.  If your DNS / DHCP servers stay on the isolated VLAN you may not be able to resolve names or IP’s as you do on your corporate network.  However AD domain controllers will eventually have serious problems (and become useless) if they do not see their peers on a regular basis.

Network

This is the Network section that you can see when editing a Recovery Plan

So we now have a private isolated network between VM’s on a variety of hosts.  We have services like DNS / DHCP on that same isolated network.  But now we need to provide access to people to do testing.  While this can be done via the Virtual Infrastructure Client, that is often not the best way.  It requires resources beyond what an RDP or VNC connection direct to the VM requires, and best practices for VirtualCenter suggest less than 20 connections to VC at one time.

There are several suggestions that can help with this access issue.  You can have PC’s that are connected directly to the isolated VLAN.  I have set this up in shared workspaces like coffee rooms and classrooms.  People are able to work as they expect due to the shared infrastructure services in the isolated VLAN like AD or DNS.  Another example, and my favorite is to have one or more VDI servers that have access to the isolated VLAN and then users can connect from their own desk using a browser to the VDI server and have access to a desktop VM on the inside of that isolated VLAN.  This requires the VDI server to have two network cards – one for the isolated VLAN, and one for the corporate network.  This VDI server is NOT a router so it will not break the important isolation rule.

So we now have an isolated VLAN with infrastructure services, VM’s from a variety of hosts, and even have users accessing those VM’s, and with no chance of our testing touching the corporate assets that they mirror.

SRM is groundbreaking workflow software that like our products before it really highlights how virtualization truly enables IT and I think that this test failover networking I wrote about here lives up to it!

Isn’t virtualization wonderful!

More Information?

SRM Administration Guide - http://www.vmware.com/pdf/srm_10_admin.pdf

Virtual Switch info – ESX Server Configuration Guide - http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_3_server_config.pdf

SRM 1.0 Update 1 Release Notes - http://www.vmware.com/support/srm/srm_10_releasenotes.html

SRM Community Forum - http://communities.vmware.com/community/vmtn/mgmt/srm?view=discussions

VMware Converter product page - http://www.vmware.com/products/converter/

VMware SRM product page - http://www.vmware.com/products/srm/

If I can ask a small favor?  Please let us know what you think of this article and blog by leaving comments.  If there is something you would like us to write about please do not hesitate to leave a comment about that!  Thanks for helping.

Michael

==== END ===

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c328153ef010536d46dd6970c

Listed below are links to weblogs that reference How to exploit the test bubble for all its worth:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Excellent info Michael!
Thank you for such a quality post!
-David
www.VMwareVideos.com

Thanks David for the nice comment! If there is anything else you would like to see in this blog let us know.

Michael

I get around the AD / DHCP / DNS problem in the test bubble by just including the server from the protected site that runs these services into the recovery plan.

Correct Adrian that can work well. However it is important to have a virtualized AD server, and to 'refresh' it occasionally so that it doesn't become stale.

Adrian,

I forgot to mention that in your method you need to make sure that changes in the test doesn't make changes in the DC in the production side as that can cause issues. Password changes, or user information, creating or deleting servers. AD time stamps a lot of things, and updates a lot of things and this can be an issue when you take your test environment down.

Michael

"Another example, and my favorite is to have one or more VDI servers that have access to the isolated VLAN and then users can connect from their own desk using a browser to the VDI server and have access to a desktop VM on the inside of that isolated VLAN. This requires the VDI server to have two network cards – one for the isolated VLAN, and one for the corporate network"

Can you expand on that? What kind of VDI server, VMWare View, or can it just be an ESX box with guest VMs?

Post a comment