In this post we will
focus on the activity that has surrounded VMware Site Recovery Manager (SRM)
since its launch. According to our own download page Site Recovery Manager
(SRM) build 97878 has been available since 2008/06/19 so what has been
happening with SRM in the field since that date? The short answer is a lot! SRM
has been written about in many places, VMware
evangelists (yes that’s you Mike!) now have books specifically dedicated
to SRM administration and SRM has been a big draw at all VMware stands at the
various shows we have put on or attended during 2008.
As one of VMware's technical folks my main
interest is what has been good, bad or plain ugly from an implementation point
of view. I spend a lot of my time assisting customers and partners with their
SRM deployments, configuration woes and the like so I wanted to give you all a
quick run down of some of the key gotchas as well as some pointers to useful
references for SRM help. Going forward I hope to post further blog updates
focusing on specific topics of SRM implementation including closer looks at
networking, storage replication integration, sample architectures, customizing
recovery plans to name a few. Getting back to some of the common questions that
come up let's get started.
Q.
We
have installed SRM but cannot see any SRM screens inside the vCenter Client?
To make the SRM icon and screens available
you must download/install the SRM plugin via the vCenter Client “Manage
Plugins” menu. Your vCenter userid will also need the appropriate privileges to
be able to work with SRM.
Q.
Where should we install SRM (and the SRA)?
o In a VM?
o On the same VM as vCenter?
o Should the SRM database reside
alongside the vCenter db?
o Can the SRM database be of a
different type? i.e. Oracle?
It can depend on a lot of factors some of
which we have listed below. For POC/Eval and test environment most customers
will deploy the two SRM servers (and their databases) alongside their vCenter
servers, within the same virtual machines. For production environments the
reality of day to day operational processes will probably mean the SRM server
and SRA (storage replication adapter) will be installed alongside each other in
their own virtual machine to make tasks such as raising change requests for
maintenance / patching more straightforward. Other factors will include:
o
Size of
VI environment (number of ESX hosts/number of VMs)
o
Small
number of hosts & VMs can mean customers
deploying SRM in same VM as vCenter as typically in this configuration the
vCenter server is lightly loaded.
o
Larger
number of hosts & VMs customers installing SRM components in separate VM
o
Type of
SRA being used can be a factor i.e. does your SRA need access to “admin” LUN(s)
to communicate with storage?
Q.
We download the SRA for our storage platform from vmware.com install, no
other checks needed, is that correct we just go ahead and install?
Not quite, each vendor provides a readme, you
must ensure you review this first. Second each storage vendor also generally supplies
a whitepaper / technote covering best practice implementation for setting up
their adapter (SRA), ensure you seek these out! Links to documentation from
storage vendors can be found on the SRM resources page: http://www.vmware.com/products/srm/resource.html
Q. Do all the of SRA adapters communicate with their respective storage arrays
in the same way?
No, again each vendor’s architecture is
different for connectivity some require the installation of a client side
remote command suite (provided by the storage vendor) some don’t. Again review
your storage vendors readme and implementation guide and if you have one, speak
to your storage team. Don’t forget the SRA’s are supported by the storage
vendors so if you do have issues with the adapters you can raise support
requests with your storage vendor assuming you have a valid support contract.
Q.
Our Storage Replication Adapter (SRA) is installed correctly; all seems
ok however in the datastore groups screen no
datastores appear?
If the
replicated VMFS datastores are empty i.e. contain no VMs then the datastore will not appear. Add VMs into the datastore(s) and use the rescan arrays button to update the view.
Q.
When creating a protection group SRM prompts for a datastore location to
house “Placeholder VMs” what are these used for?
Placeholder virtual machines are
used to identify a location of the recovered VM in within the recovery site vCenter inventory. SRM
will replace the placeholder VM with the VM registered from the replicated
storage during testing / failover.
Q.
During
the install process port 80 is defined as the communication port for vCenter, can
this be changed?
Even though SRM uses SSL when it
communicates to vCenter, it does not use port 443. SRM establishes a TCP
connection to port 80, then uses an HTTP CONNECT request to establish a tunnel
to the vCenter server, then does an SSL handshake with vCenter over that
tunnelled connection. The SRM installation enforces these semantics.
Q.
Which datastore should be selected to hold
the placeholder VMs? What to consider?
The first recommendation
would be to locate all of the placeholder virtual machines in the same
datastore at the recovery site. If all the placeholder virtual machines are
located in the same datastore at the recovery site they will be easier to
locate should you need to and equally simpler to locate should you need to
perform any troubleshooting.
Having a small datastore
set aside for use only as the SRM placeholder virtual machine datastore will
also mean you are not placing them in datastores at the recovery site that
contain actual virtual machines that reside at that site permanently. vCenter
users not authorized to use or familiar with SRM may find it confusing should
they stumble across a placeholder virtual machines folder lying within a
datastore normally used for other virtual machines. Other factors to consider:
o Datastore
needs to reside at the recovery site.
o Datastore
does not need to be replicated.
o Sizing
– datastore will only contain VM config files (*.vmx, *.vmxf, *.vmsd (typically 3 files < 1KB each).
Q. Which
vCenter object is SRM enabled on, Host, Cluster, Resource Pool?
In SRM the basic unit of replication is the
datastore. Recovered VMs can be placed on arbitrary hosts/clusters, as long as
the hosts can access the replicated datastores.
Q. Do all
VM’s we are protecting need to be in a cluster?
No. SRM only requires separate vCenter
instances. One managing protected site and other managing recovery site.
Q. For
failover how does SRM guarantee resources at the recovery site?
SRM can suspend local VM’s at the recovery
site as part of recovery plan. Best practice is to also use resource pools at
the protected site and map these to resource pools at the recovery site using
SRM “Inventory Mapping”
Q. We see that the “Recompute Datastore Group”
task run periodically within vCenter since we installed SRM, what triggers
these tasks?
Datastore Group computation is triggered by
the following events:
o
Existing
VM is deleted or unregistered
o
VM is
storage vmotioned to a different datastore
o
New
disk is attached to VM on a datastore previously not used by the VM
o
New
datastore is created
o
Existing
datastore is expanded
Q. Occasionally when we login to the SRM screens we see the sites
pairing status displayed as “Low Resources on Paired Site” what causes this?
The “Low Resources…” message can be
generated if any of the following conditions are true on the server (VM) where
SRM is installed:
o
Remote
site free disk space drops below 100 Mb (default)
o
Remote
site CPU usage goes above 70 % (default)
o
Remote
site available memory drops below 32 Mb (default)
These are default values which can be
configured by modifying the vmware-dr.xml file located in C:Program
FilesVMwareVMware Site Recovery Managerconfig. The fields to modify are minDiskSpace,
maxCpuUsage, and minMemory.
Q.
What are the SRM failback options we see no button for failback which is
confusing us?
SRM absolutely supports failback and each
storage vendor documents the failback process for their specific replicated
storage configuration. What you have to consider is that without SRM in your
virtual environment you are back to manual and/or home grown scripts for DR you
will no longer have automated Recovery Plans, no offline DR testing
capabilities, and no DR audit trail. You can still failback manually without
using SRM, high-level steps would be:
o
Delete the protection groups in the Protected Site vCenter
o
Unregister the protected virtual machines in the Protected Site vCenter
o
Work with your storage team, reverse data replication
o
VM re-inventory in Protected Site vCenter, restart and re-ip (manual or scripted)
With SRM in place you will have Recovery
Plan(s), the ability to test failover before Recovery, and will have a built-in
audit trail. SRM can also be
used to help you failback once your primary site has been restored. The
high-level steps would be:
o
Delete the protection groups in the Protected Site vCenter
o
Unregister the protected virtual machines in the Protected Site vCenter
o
Work with your storage team, reverse data replication
o
Leverage SRM, complete SRM workflows in the reverse direction from
Recovery Site back to the Protected Site
Repeat the above steps from the Protected
Site back to the Recovery Site to complete the re-protection of the virtual machines in
the Protected Site.
I hope that has answered a few FAQs I am
sure there will be more to come but for now,
thanks for stopping by!
Lee Dilworth