Hello everyone,
I have participated in a lot of SRM implementations and proof of concepts, and have talked to a lot of customers who have gone through the SRM implementation and there is quite a pattern to the issues that can develop. Today, I want to talk about them as with this information you are forewarned and that can help you avoid these issues.
The issues generally fail into three categories –
Applications, Storage, and SRA
Applications means you need to know what applications are important to you, and what they require to work. Some applications require web services, and often those web servers will use background windows authentication, so that means they need domain controllers somewhere they can talk to as well as web servers. Most applications today need DNS, and the users need DHCP. So one simple app will require perhaps as many as 3, 4, or maybe even 10 other machines so that one app can work. Having an application registry, where you record the applications, the application owner, and the components that application requires to work will be very helpful in your DR project. It also helps in planing upgrades and troubleshooting as well. Even without DR projects this is worth doing. And there are applications that can help you with the gathering of data and the relationships between applications.
With storage, sometimes our customers have no pattern of which application is stored on which storage. Which means when you protect the SharePoint set of LUNs, you may also have SAP, Exchange, and SQL VM's inside those LUN's. And when you do a failover, instead of being able to fail SharePoint over, you will have to fail SAP, Exchange and SQL as well since they share the LUNs. Poor storage organization means little or no granularity to your DR strategy. Which is worse than it sounds. In my experience, our customers fail over one or two applications more often than they fail everything over. If you are going to have a 4 hour outage in your datacenter on Saturday, would it not be nice to fail your email over so you would have only a very short outage on it? Not only is that a great test – since it is a real fail-over – but it also minimizes a legitimate outage. I recommend that your most important applications should be able to be failed over on their own, and with your least important applications they can all fail over together.
The Storage Replication adapter is the connection between SRM and your storage environment. It is what allows one button in SRM to have actions in VMware AND your storage execute together. The SRA is developed by our storage partners to a VMware API. There is a great range between what each vendor's SRA can do. You might find one storage vendor that has replication technologies that has a great many flavours and features. But the SRA will only support a small subset. This may not be an issue, unless the replication features you are using are not in that subset. In addition, not all storage vendors document their SRA requirements equally. Some vendors have readmes that are not in the archive of the SRA, and some vendors have release notes, install guides and readme files in the archive. Some vendors have white-papers on how to use the SRA, but also how to integrate SRM with the vendor's storage environment. You should investigate your storage vendor and their SRA carefully to avoid surprises and frustrations.
More information, and useful links are in my VMworld presentation where this information came from. It is session BC6703 and can be found at http://www.vmworld.com/docs/DOC-4823 . On the VMworld site you can see and listen to the presentation which I recommend if you can as there is other info shared verbally. However, for those of you who have not attended VMworld, you can obtain the PDF at Download BC6703_formatted_v3 .
As always, I most appreciate you being our customer and using SRM. Let me know about your experiences by leaving a comment. If you are interested in specific things you might like me to write about leave me a comment.
Thanks, and have a great day!
Michael