VMware

11/20/2009

VMware Data Recovery and File Level Restore

Today we released an updated version of VMware Data Recovery (VDR) – we have been working hard on this release for the past few months and we think it moves the needle significantly forward.    Get more information about the updated VDR release here.

So, what are the highlights of this updated release of VDR? 

  • Support for backing up Windows 2008 R2 and Windows 7 virtual machines (requires upgrading to vSphere 4.0 Update 1)

  • Customization of the VDR parameters such as how frequent integrity check and reclaim operations run, maximum concurrent backups allowed and disabling any file level restore operations.  

  • The VDR virtual appliance is now a HW7 virtual machine and runs 64-bit CentOS 5.2 as the guest OS.

However, the most obvious change is the addition of a new  File Level Restore (FLR) client for Windows virtual machines – this is the fully supported version of the FLR client to allow granular restore of individual files.    This new functionality complements the full virtual machine restore capability that has been available since VDR 1.0.

Let's do a quick walk through of the new FLR client.  I installed the FLR client directly on the desktop of one of my Windows virtual machines and have accessed it via a console connection.  Double clicking on the FLR client icon will launch it.

FLR1

The first thing you will need to do is to log in to the VDR appliance.  By default, you log in via standard mode and the VDR appliance will display only the restore points that match the universally unique identifier (UUID) for the virtual machine that the FLR is being executed from.   You can also login via advanced mode, whereby all restore points for all protected virtual machines will be presented by the VDR appliance. 

Beyond the number of restore points presented, the operation of the FLR client is identical in either mode.  The use case for the standard mode is for individual users to perform self serve restores while the advanced mode is typically used by vSphere administrators to restore batches of files to different virtual machines.

FLR2

Once logged in to the VDR appliance, you are presented with the various restore points.  Highlighting a restore point and selecting Mount will mount the point in time copy of the virtual machine’s disk(s).

FLR3

VDR will mount the restore point as a read only directory – note the directory with the identical date and time stamp below. 

FLR4

Once the restore point is mounted, you can open up the directory, browse to the file that needs to be restored and drag/drop the file to another location.   Note that multiple restore points can be concurrently mounted.

FLR5

Once the restore operation is complete, close out the FLR client and the restore points will be automatically dismounted.


10/28/2009

SRM 4.0 and Windows 2008 Support

Hello Uptime Readers,

We have seen a lot of questions lately relating to SRM support for Windows 2008 and there seems to be a lot of confusion out there so it seemed like a good time to maybe write a short blog to try and clear things up.

 

When you are working with or implementing SRM 4.0 and are asking questions or looking for information on operating system support the first thing you need to understand that will hopefully make things simpler is in what function will the operating system be used, there are two choices really:

  1. It will be the operating system we use to install the SRM server (or SRM client plug-in) in to

  2. It will be the operating system we use in the virtual machines we want to protect with SRM

Let’s take each in turn.

 

SRM Server

When deploying SRM you need two SRM servers, one at each site. The SRM server will in nearly all cases be deployed into a virtual machine itself but this virtual machine is not classed as a protected virtual machine as its role is simply to run the SRM server at that site and it will not normally be placed on replicated storage as there is no need to replicate an SRM server as the other site also runs an SRM server.

 

More frequently customers are now wanting to deploy SRM into Windows 2008 virtual machines. Before you do this you should review the SRM Compatibility Matrix.

 

Specifically review the section “SRM server operating System Compatibility” in this section you will need to be aware that although both Windows 2008 x86 and x64 are listed you should review the table carefully and understand that at time of writing the following statements are true:

  • If you want to use Windows 2008 to host your SRM Server note it is currently ONLY supported on the x86 (32bit) editions of Windows 2008 running SP1 (support for R2 editions will be reviewed on an ongoing basis). UPDATE: SP2 x86 support is now available!

  • If you want to use Windows 2008 to run your vSphere client and therefore install the SRM vSphere client plug-in note this is supported on the x86 and the x64 editions of Windows 2008 running SP1 (support for SP2 and R2 editions will be reviewed on an ongoing basis)

Table2 

Protected Virtual Machines

Now that we have covered the SRM server what about the virtual machines you actually want SRM to protect, those virtual machines running your production workloads and applications that are sitting on your replicated storage.

 

As with the SRM server if you review the SRM Compatibility Matrix you will find the following section:

 

GuestOS 
 

If you are looking for clarification for Windows 2008 support (though you could use this example for any GuestOS) you need to understand what each of the above paragraphs is telling you. First I think we could improve the clarity here and this is something we will review internally for the next documentation update. If we start with “Guest Operating System Support” the statement is “SRM 4.0 supports all guest operating systems supported by vCenter 4.0”.

 

What does this actually mean? From the SRM perspective what this is actually telling you is that SRM can “protect” any guest operating system that is a supported guest operating system on the vSphere 4.0 platform. You can review the full list of supported guest operating systems for vSphere here by setting:

 

Product Name = ESX

Product Release version = ESX 4.0

OS Use = Guest OS

OS Family = Windows

OS Name = Windows Server 2008

 

All of the guest operating systems produced in that list could be protected by SRM 4.0 with one additional consideration, do you want to customize the GuestOS (for example network changes using SRM’s ip customizer tool) during recovery? If the answer to that question is NO then any of the Windows 2008 operating systems listed on the HCL page you have just generated could be protected by SRM 4.0.

 

If you DO wish to customize the protected guest operating system during recovery using SRM 4.0's built in image customization capability (if you know how vCenter VM image customization works then you already understand this technique) then notice in the SRM 4.0 compatibility matrix picture above there is a second paragraph that refers to guest operating system customization support.

 

Although the section indicates that all of the same guest operating systems can be customized there are some versions of Windows 2008 that are not currently supported by SRM 4.0 for guest customization.

 

Currently SRM customization support for Windows 2008 does NOT include ANY R2 versions. Windows 2008 R2 is a new release of windows and is considered by many to be the server release of Windows 7. This is really the source of the supportability differences between a Windows 2008 SP1/SP2 edition and an R2 edition.

 

Customization support for the R2 releases of Windows 2008 will be reviewed as part of our ongoing SRM update program.

 

hope this helps,

Lee Dilworth


10/15/2009

Meet the SRM 4.0 Engineers!

SRM 4.0 was released on 10/5/09.  We hope that you have gotten a chance to evaluate the new features of this SRM release.  This release is the result of the hard work of a group of dedicated VMware software engineers and we would like to have them share their perspectives on the SRM features.  Maria and Glenn - both SRM software engineers – have shared their insights on the SRM features on videos:

 

Maria provided her insights on the new features SRM 4.0.  In her video, she discussed the following topics:

·         vSphere support

o   Fault Tolerance

o   vDS

o   DPM

o   Linked Mode

·         NFS support

·         Shared Recovery Site

·         Enhancement in reliability, robustness and scalability

 

Glenn, on the other hand, focused on the SRM core features.  In his video, he discussed the following topics:

·         Virtual disaster recovery powered by SRM

·         Automated recovery workflow

·         Testing of recovery plans

·         SRM architecture and components

·         Test networks

·         Audit Trail

 

The 2 videos together give you an overview of the SRM 4.0 features and we highly recommend that you watch these videos if you are interested in learning more about SRM.  The links are listed below:

               

                Maria: SRM 4.0 Features

                Glenn: SRM Core Features

 

Thank you,

Desmond

 

 


10/05/2009

SRM 4.0 - Get it now!

When VMware demonstrated a technology preview of the new Site Recovery Manager (SRM) features at VMworld 2009 a month back, significant portion of the audience during the VMworld SRM sessions indicated that they were waiting on NFS support to deploy SRM. Several customers said that they are waiting for SRM support to deploy vSphere 4.0 in production. We heard loud and clear how important these features are for your organization. I am excited that today we have delivered vSphere 4.0 and NFS support as a part of SRM 4.0 GA release. 

 

This release would not have been possible without the involvement of our beta customers and partners. On behalf of the SRM team, I would like to send a thank you note to all the SRM 4.0 beta customers, storage partners and channel partners, who tested the SRM 4.0 beta product and provided us with valuable feedback during the SRM beta process. I also want to thank the storage partners who helped recruit participants for the SRM 4.0 beta program.

 

I am happy to share some great comments from beta participants:

 

“In my 25 years of IT experience, VMware vCenter Site Recovery Manager is one of the best solutions that I have ever implemented.  With the latest SRM improvements and features, coupled with vSphere, disaster recovery preparation is much easier.” – Ed Belliveau, Principal Systems Engineer, Bentley University

 

“VMware vCenter Site Recovery Manager is the industry-leading product for automated disaster recovery of virtual environments”, Hany Michael, Qatar General Electricity & Water Corporation

 

We hope you will try out SRM 4.0 release in your vSphere environment and look forward to your feedback and comments. You can download SRM 4.0 here.  Please be sure to check out the release notes and documentation as well as other VMware Uptime blogs ("A quick tour of SRM 4.0"  as well as "SRM 4.0 is here! The wait for vSphere and NFS support is over") to learn more about new SRM 4.0 features as well as upgrading to SRM 4.0.

 

Thanks again.


A Quick tour of SRM 4.0

 

Now that SRM 4.0 is here it seemed like a good time to take quick tour around a few of the new features and also highlight some less obvious elements of the vSphere platform that make working with SRM 4.0 simpler.

 

A sensible place to start would be at the beginning. As with previous SRM releases your first task after installing the SRM components is to login to your vSphere client and download the SRM 4.0 plug-in.

 

Slide9-mod


 

Once the plug-in is installed and enabled you will finally have access to SRM from the vSphere client

 

Slide1-mod


 

Customers who have upgraded from SRM 1.x or have previous experience of SRM 1.x will be familiar with the SRM site pairing and login process (both of which are covered in the SRM admin guide. 

 

One new feature in SRM 4.0 is the ability to configure a multi-tenancy shared site setup. If you have this feature enabled then you will  be presented with an additional step during your login process:

 

Slide3-mod


 

 

At this stage you simply select the SRM site name you wish to connect to and the system will then authenticate your account with that site. It is possible through the simple privilege and permissions model to allow users access to one site but not others which is obviously a key requirement for a multi-tenancy setup.

 

Note: shared site setup is covered here.

 

Note to reader and myself: let use know via the comments section if you would like the shared site feature covered in more detail in a future blog post?

 

So once you are logged in to SRM 4.0 the UI layout and operation looks very much the same as SRM 1.x (if you've worked with that before) so what is new? What makes your life simpler? One configuration option outside of SRM that can make working with SRM very slick is the vCenter 4 linked mode option. Linked mode basically allows you to administer multiple vCenter instances from within a single vSphere client:

 

Slide2-mod


 

The SRM architecture always involves the use of two vCenter instances. One at your protected site and one at your recovery site. During SRM usage / configuration when performing tasks such as creating protection groups, inventory mappings, ip-customization setup or running recovery plan test there was always a need to have at least two VI client windows open in SRM 1.x and vCenter 2.x setups.

 

Slide4-mod


 

With SRM 4.0 and the vSphere platform if we now utilize linked mode vCenter we can control and monitor the whole environment from within a single vSphere client session as we can now see both the protected and recovery site inventories side by side. As an example as simple operation such as "Create Protection Group" results in placeholder VM's being created at the recovery site. We can now see these objects appear in real time within our client window and sanity check for ourselves that these have appeared as we expected and within the right vCenter object locations (resource pool/folders/network).

 

That mention of inventory mapping's and networks is a nice lead into another common question that has come up in recent weeks, that question being "Will SRM 4.0 recognize and work with distributed network objects such as vDS and/or 3rd party virtual switches?" The short answer is yes and I can illustrate this quite simply by showing an inventory mapping screenshot from my SRM 4.0 setup which has access to 3rd party virtual switches at both the protected and recovery sites:

 

Slide7-mod


 

What other things can we quickly find in SRM 4.0 that can make life easier? Well one of the nice things is that most if not all of the tweaks or advanced options if you like used to be configurable by editing your vmware-dr.xml file on each of your SRM servers. This was all well and good but editing xml files in production is not really ideal and I guess you don't really want to have to drop out to a editor every time you want to make a change.

 

The good news is in SRM 4.0 you can now make the changes if any are needed from with the SRM 4.0 UI itself. To find the way into this screen you simply need to right click on the work "Site Recovery" that appears in the left hand pane tree menu of any SRM screen, here is a quick look at this in my environment:

 

 

Slide4-mod


 

Another nice little feature (there are MANY others but this is a blog and I have already rambled on too long in this post!!!) is something that caught a LOT of customers (and me included) out in SRM 1.x. Essentially the issue was if I replicated a new datastore to my recovery site and had not got around to actually creating any VM's in it yet I would not see the datastore in the SRM "Review Replicated Datastores" screen.

 

The result of this was a lot of head scratching, storage replication configuration sanity checking and general confusion until you suddenly put a VM in the datastore and then ZAP the datastore suddenly appears. The logic behind why SRM 1.x chose not to display empty replicated datastores was a sound one (SRM is all about protecting VM's, if there is an empty datastore there are no VM's to protected) but it made more sense going forward to at least show the datastores (or exports now we have NFS support) are there even if they are empty:

 

Slide5-mod


 

At that point I think I have gone on for long enough and there are a stack of emails waiting for responses in my inbox so its back to work for me........actually one final quick pic....lets see SRM 4.0 recovering a NFS export:

 

Slide8-mod


 

Nice thing here is that during the failover or test recovery there was no need for us to prime the recovery array at all. The new SRA's that support NFS handle all of the export creation / mappings for the recovery site ESX hosts.

 

Enjoy SRM 4.0!

 

Lee Dilworth


10/04/2009

SRM 4.0 is here! The wait for vSphere and NFS support is over!

October 5 is the day that you can now use the new version of SRM to protect vSphere hosts as well as the 3.0.3 and 3.5 that it protects today.  In particular, I love using the vSphere Linked Mode support with SRM as it is a little easier to use one client for both the protected and recovery sides instead of the two clients we require now.  This article is not about the new features however, but about how to upgrade.  

 
But some things to touch on first. It is actually 4.0 since our marketing people decided it would be easier for customers to see SRM and vSphere version numbers to be in sync.  This also means in the future you will not have to wait long for SRM to work with the next major releases of vSphere like you did this time.  It has something else for you to be aware of, in that SRM 4.0 requires you to use vSphere Virtual Center 4.0.  You can still protect VM’s hosted on 3.0.3, 3.5 and now 4.0 but you must use VC 4.0 and not VC 2.5.  From now on SRM will require the current version of VC to work with.  As well, SRM 1.0 licenses are not compatible with SRM 4.0.  We do not use Flex LM with SRM any longer so you will need to log into the Customer license portal to download your new SRM licenses.

So lets talk about upgrading.  You have several choices.  You can install new, with a new database, upgraded VC, and protect your VM’s with new Protection Groups and Recovery Plans, or you can upgrade.  Upgrade means to upgrade your VC, SRM, and database, and this will keep your existing PG’s and RP’s.  This is the quickest way to upgrade and keep your protection.  Doing a net new install will be generally slower.  For that reason, I recommend doing an upgrade as it will have the smallest outage in your protection.

If you would like to have a new SRM server but with your old configuration that is a little harder.  You will need to do an upgrade in place on your old SRM server, and than after that is complete and functional, you can install a new SRM server, and point it at the upgraded database.  If you want a new SRM server this is the ONLY way you can do it safely.  If this sounds a little confusing, call our Support team and we can help you with it.

While the two links below have more information about the upgrade, I did want to suggest some things. 

  1. Log into the license portal to get your new SRM 4.0 licenses
  2. Log into the SRM download page to get any updated SRA’s – not all SRA’s are necessary to be upgraded.  As a guideline, if you are going to use NFS you MUST upgrade your SRA, but if you are going to continue with iSCSI or FC you may not need to upgrade.
  3. Read the release notes, and the KB article.
  4. Start with full backups!  And test SRM so you know it works before you upgrade.  A test failover is fine here as a test of operation.
  5. Upgrade the protected side VC.  
  6. Upgrade the protected side SRM / database and plug-in.
  7. At this point you can still do a full failover if necessary so that means so far your ability to recovery has not been impacted.  You cannot protect other VM’s or anything else.  This situation is as if your protected side disappeared.
  8. Now your DR outage occurs!
  9. Upgrade your recovery side VC.  Optionally this is where you might enable Linked Mode.  I certainly did.
  10. Upgrade your recovery side SRM / database and plug-in.
  11. Install the new licenses on both sides – if you have protected VM’s on both side.  The new licenses are entered in the License Settings field in the Advanced Settings, which is accessed by <right + clicking> on Site Recovery Manager  in the navigation pane. 
  12. Perform a test failover to make sure everything works.  At this point your test failover should work, but you still have ESX hosts at the same level, VM’s have not been touched, and the only change is the VC and SRM instances that have been upgraded.  All your VM's should be still working fine.
  13. You can now wait to upgrade your ESX hosts as necessary.  I recommend starting the upgrade at the Recovery side so that your production can failover easily.  Meaning the ESX 3.x at the protected side can failover to the ESX 4 hosts at the recovery side.  But don’t upgrade the Virtual Hardware or tools yet.
  14. Ater the recovery side is upgraded to ESX 4, you should do the protected side.  But do not do Virtual Hardware or tools upgrade yet.
  15. Once both sides are upgraded to ESX 4, test SRM with a test failover.
  16. Now you can upgrade Virtual Hardware and Tools.

You are now completely upgraded, however be aware that you do not have to upgrade your hosts right away.  SRM will protect them at 3.0.3, 3.5, and 4.0.  And remember that ESX 4 VM’s can only failover to ESX 3.5 IF their VH and tools have not been upgraded yet!

Release notes - http://www.vmware.com/support/srm/srm_releasenotes_4_0.html
Upgrade KB article - http://kb.vmware.com/kb/1013166
SRM download - http://downloads.vmware.com/d/details/srm400/ZCVwYmQlZCpiQHBA

Use the comments to let me know how your upgrades go!

Michael


08/29/2009

VMworld: Business Continuity What to Watch

As we're all putting the final touches on preparation for VMworld, wanted to give a quick preview of some of the sessions at VMworld in the business continuity area.  Probably many people headed to VMworld have already figured out their schedules, but given how many people buy their Christmas presents at the last minute I'm sure there are also quite a few people who haven't.  Most of the session content will be posted to vmworld.com after the show, so anyone who has a subscription to vmworld.com (see http://www.vmworld.com/community/subscription/) or who attended VMworld will be able to see that even after VMworld is over.

Here's a list of a few sessions in the areas of availability, data protection, and disaster recovery that you may want to check out.  Note that there are a large number of partners presenting at VMworld that you should also be checking out (see Schedule Builder to see all sessions), but since I'm not as familiar with the content of those sessions I'll be talking just about VMware-presented sessions.

Let's start with Availability:

  • BC3197:  High Availability - Internals and Best Practices
    • We get lots of questions about how to deploy VMware HA.  This session looks under the covers at some of the important details about how HA works and talks about several of the important settings and considerations to take into account when you're deploying VMware HA.
  • BC3370:  VMware Fault Tolerance - Overview and Best Practices
    • One of the most talked-about features in vSphere, Fault Tolerance gets in this session an in-depth explanation of how it works, a list of the requirements to use it, and an explanation of what happens once you turn it on.
  • BC3369:  VMware Fault Tolerance Real-World Use Cases
    • So now that you know how Fault Tolerance works and how to set it up, what can you use it with?  This session starts with a quick review of the technology and then looks at some different scenarios related to use of Fault Tolerance, from providing guidelines for what type of applications could be protected with FT to explaining different scenarios for backup of VMs that have FT turned on.
  • BC3425:  VMware Availability Solutions and Futures
    • This session will provide an overview of the features available in VMware products for availability today and providing a few examples of things that our engineers are working on to extend that technology in the future.

Under Data Protection:

  • BC2142:  Data Recovery - Install / Configure and Troubleshooting
    • Provides an introduction to getting Data Recovery up and running, including planning and architecting backup solutions for your environment with Data Recovery.
  • Plus a lot of sessions from our data protection partners on how their solutions help you with backups of virtual machines and a session from a leading analyst on the topic.

And last but not least, for Disaster Recovery:

  • BC1500:  vCenter Site Recovery Manager "Up and Running" - Best Practices & Avoiding the Pitfalls
    • From the expert known as ¨Smoggy¨on the Forums, this session will walk you through a number of different ¨gotchas¨ that customers could encounter while implementing SRM and give you the information you need to avoid getting tripped up by those gotchas.
  • BC3301:  DR Architecture Design Workshop with SRM
    • In this session our services consultants are offering an interactive workshop format that walks you through the process of planning a DR plan for your virtual infrastructure and provides some frameworks for approaching that.
  • BC3396:  How VMware uses Site Recovery Manager for its own Disaster Recovery
    • VMware is deploying Site Recovery Manager to enable automated failover for it's Exchange deployment.  Learn about how our IT team architected and deployed a solution to meet the needs of the VMware messaging environment.
  • BC2260:  Automated Disaster Recovery for Branch Offices using SRM and vSphere 4
    • The first of two sessions previewing some work we've been doing for enhancing automated DR with SRM.  This session will focus on some work our engineering team have been doing that we hope to release later this year.
  • BC3421:  SRM Architecture & Features: The Road Ahead
    • This session will start with a review of what SRM can do today and then talk about some of the longer-term directions that our engineers are working on for the future of automated DR.
  • Plus multiple sessions from customers who are leveraging VMware software in their DR solutions and from technology partners who help provide the complete DR solution (including partners who've worked with us on Site Recovery Manager integration)

Looking forward to a great VMworld and hope to see some of you there!

--Jon


07/14/2009

VMware Data Recovery Taking Advantage of vSphere 4

I wanted to explain in more detail why we chose the type of dedupe that we did.  As I  had mentioned in my previous post, we chose to implement block based in-line destination deduplication for VMware Data Recovery (VDR).  There are a few reasons for this, two of which are due to enhancements in the VMware vSphere 4 platform itself.

1) Change block tracking:  Any new VM provisioned on vSphere will use virtual hardware version 7 (you can also upgrade your existing VM version 4 to version 7).  With VM version 7, the vmkernel tracks the changed blocks of the VM’s virtual disks.  (By the way, this the same change block tracking functionality that enhances Storage VMotion in vSphere 4). So, instead of having to scan the VM’s virtual disks to determine which blocks have changed every time a backup occurs, VDR just makes an API call to the vmkernel and gets this information “for free”.

Thus, VDR is able to dramatically cut down the amount of time and CPU cycles to calculate the changed blocks on a virtual disk.  In addition, change block tracking also helps on the restore side of the equation.  For example, if you wanted to restore yesterday’s VM image, VDR will make the reverse change block API call and will just transfer the changed blocks from yesterdays backup to revert the VM to its previous state.  So, given that there is a lot of intelligence in the platform about virtual disk blocks, block based dedupe seemed like a natural direction for VDR to take.

2) Hot add disk:  VDR can “hot add” virtual disk snapshots directly to the VDR virtual appliance.  This is accomplished by leveraging capabilities of the vSphere storage stack.  This means that VDR can bypass the LAN and stream the data from the snapshots directly to dedupe destination disk.  In addition to reducing load on the LAN and effectively eliminating the need to block out other LAN traffic during the backup window, the streaming of data to the destination dedupe disk on the Data Recovery appliance will be considerably faster.

Note that there are three caveats to enabling hot add disk with VDR:

a.       The source virtual disks need to be on shared storage

b.       The ESX host where the VDR appliance is running needs to have visibility to this shared storage

c.       You will need a vSphere edition that includes Hot Add as a feature

The knock against destination (or target) based dedupe is the fact that it consumes precious network bandwidth with the unnecessary transfer of data that will be discarded as part of the dedupe process.  However, given that VDR only transfers changed blocks and can transfer these blocks off-LAN, the concern did not apply and thus we felt comfortable with a destination based dedupe architecture.

So does this mean that unless you have both change block tracking and hot add disk features enabled in vSphere 4, VDR and its dedupe capability is useless to you?  Absolutely not!  All data that is protected by VDR will be deduped, so you will enjoy the storage savings independent of what VM version is being backed up or what vSphere edition you are have installed.  What change block tracking and hot add disk adds is additional efficiency and performance gains that will allow even more data to be protected in an ever shrinking backup window.


06/22/2009

VMware's Backup and Recovery product

One of the many capabilities introduced in VMware vSphere 4 is VMware Data Recovery (VDR), a virtual machine backup and recovery product.  Market research and customer feedback showed that many people wanted an integrated option for protecting virtual machines in a VMware environment.  Further analysis showed that this was more eminent for VMware customers that had (or plan to have) fewer than 100 virtual machines in their environment and where IT responsibilities (including VMware) were shared among 2-3 IT administrators (as opposed to having a dedicated VMware administrator on-staff).

VMware has been helping customers address their backup challenges in two ways:  making significant investments in the vStorage APIs for Data Protection that third-party backup tools use to integrate their backup/recovery products with vSphere, and in providing an integrated option optimized for vSphere customers with smaller environments.  VDR is built using the vStorage APIs for Data Protection and incorporates a user interface, policy engine and data duplication - see the diagram below on how it all fits together.  I’ll cover these blocks in a series of blogs but I wanted to start out by discussing Data Deduplication (dedupe).

VDR_Arch

 

Given that we had a made a decision to only use disks as the destination for the VDR backups, we had to look for a solution that offered disk storage savings – and this is where dedupe comes in.  In a nutshell, dedupe avoids the same data to be stored twice - and dedupe is HOT – just check out the mergers and acquisitions news! 

What VMware decided to implement for VDR dedupe is (take a deep breath) - block based in-line destination deduplication.  Deconstructing it means the following:

    1. We discover data commonality at the disk block level as oppose to the file level.

    2. It is done as we stream the backup data to the destination disk as opposed to a post-backup process.

    3. The actual dedupe process occurs as we store the data on the destination disk as opposed to when we are scanning the source VM’s virtual disks prior to the backup.

When it comes to deduplication, there are different techniques and hash algorithms used to accomplish the result.  I am not going to get into a theoretical discussion of the pros and cons of the various types of dedupe technologies available and which approach provides the best disk savings.  I personally think that it totally depends on the customers’ IT environment constraints and their overall business goals plus a lot of the storage savings is going to be data driven anyway (the more data commonality there is, the better the dedupe rate).  We chose this dedupe architecture because it fit best with what we were trying to achieve with VDR and what the vSphere platform provided to us.  What were these reasons?  Stay tuned to this space……


02/03/2009

Failback? Absolutely! Absolutely!

Since VMware released vCenter Site Recovery Manager (SRM) in June 2008, the product has gained tremendous momentum in customer adoption. Customers are impressed with the SRM capabilities that turn their manual disaster recovery (DR) runbook into automated recovery plans. They execute the recovery plans in test mode as frequently as they desire in order to eliminate any glitches in the plans. When a disaster happens, they are confident that SRM can be entrusted to seamlessly perform disaster recovery for them. (See below for the DR and SRM terminology used in this blog.)

DR and SRM Terminology

Description

Failover

Event that occurs when the recovery site takes over operation in place of the protected site after the declaration of a disaster.

Failback

Reversal of failover, returning IT operations to the primary site (Site A).

Site A

The protected site before failover.

Site B

The recovery site before failover.

Protection Group

A group of virtual machines that will be failed over together to the recovery site during test or recovery.

Recovery Plan

A recovery plan contains the complete set of steps needed to recover (or test recovery of) the protected virtual machines in one or more protection groups.

Shadow Virtual Machines

An artifact in the recovery site VC inventory that represents a protected VM from the protected site VC

With SRM failover capabilities, customers can successfully recover their workloads on Site B.  If you are one of those customers, do you wonder how to failback to Site A? I bet you do!

Do SRM customers need failback capabilities?  Absolutely!

Many SRM customers have told us at VMware that failback is important to them for reasons such as:

·         They do not rely on their recovery site for an extended period of time. When the Site A is recovered, customers prefer to have the workloads running on that site instead.  Site A (the primary site) is typically allocated more computing resources than Site B (the recovery site) and it is geographically closer to the business units.  From performance considerations, it makes sense to failback to Site A in many cases.

·         They may need to failover to Site B (in recovery mode) as part of their scheduled disaster recovery testing or maintenance.  Afterwards, they need to failback to Site A.

Does SRM v1.0 make failback easier? Absolutely!

From the customer feedback we gathered, we understand how important failback it is to our customers. Before then customers still have many options to streamline and expedite their failback process using SRM. Using SRM to perform failback provides tremendous values:

·         Automated recovery plan(s)

·         Automated testing before recovery

·         Built-in audit trail

Now you probably wonder what is involved in using SRM for failback. In a nutshell, SRM-assisted failback involves two directional reversals of protection: from Site B to Site A and then back to from Site A to Site B. In order to perform the directional reversal of protection to be from Site B to Site A, the following steps are involved:

1.       Reverse the replication direction in the storage layer to be from Site B to Site A

2.       Clean up the shadow virtual machines and protection groups on Site A

3.       Clean up the Recovery Plans configured on Site B

4.       Configure the protection group(s) on Site B

5.       Configure the Recovery Plans on Site A

6.       Test recovery from Site B to Site A

7.       Perform the recovery from Site B to Site A

Afterwards, you will need to perform steps 1 – 7 for the directional reversal of protection from Site A to Site B before you can call the failback done.  You can find more details on failback in the resources listed below:

·         Chapter 6 of SRM Evaluator’s Guide: http://www.vmware.com/pdf/srm_10_eval_guide.pdf

·         The storage vendors that implement SRA (Storage Replication Adaptor) for SRM also have individually produced documents that describe how to reverse the replication direction in the storage layer and how to leverage SRM to perform failback. Mike Laverick, author of the book “Administering VMware’s Site Recovery Manager”, has started a thread on the VMware SRM community forum to track a list of documents published by Storage Vendors on SRM: http://communities.vmware.com/message/1037176#1037176. The list includes some documents that describe how to perform failback using SRM on specific storage platforms.

You may think that the SRM-assisted failback steps illustrated above are not trivial to implement. Once you compare the man-time spent on SRM-assisted failback with the man-time on manual failback, you will appreciate the benefits of SRM-assisted failback. The table below lists the man-time estimates of the failback operations with and without SRM, assuming a protected environment of 100 virtual machines:

Failback Steps

SRM-assisted Operations

Manual Operations

 Man-Time Estimates of SRM-assisted Operations

Man-Time Estimates of Manual Operations

Reverse the replication direction in the storage layer

Use Storage Vendor’s configuration tool to reverse the replication direction

Same as SRM-assisted operations

15 minutes

15 minutes

Clean up the shadow virtual machines and protection groups on Site A

Use SRM Plug-in to perform the clean-up.

Same as SRM-assisted operations

10 minutes

10 minutes

Clean up the Recovery Plans configured on Site B

Use SRM Plug-in to perform the clean-up.

Same as SRM-assisted operations

10 minutes

10 minutes

Failover Configuration(s) in SRM

Complete Array Manager configuration on Site B

N/A

10 minutes

0 minute

Configure Protection Groups

N/A

10 minutes to 2 hours depending on the level of customization

0 minute

Configure Recovery Plans

N/A

10 minutes to 2 hours depending on the level of customization

0 minute

Testing

Automated via SRA

Go through all the manual runbook operations

60 minutes

3 to 4 man days (8 hour days) depending on the complexity of manual coordination.

Stop current cycle of replication

Use storage vendor’s configuration tool to perform this step

Use storage vendor’s configuration tool to perform this step

10 minutes

10 minutes

Make the target LUN (i.e. remote volume) a primary volume

Automated via SRA

Use storage vendor’s configuration tool to perform this step

0 minute

10 minutes

Make the target LUN read-writeable

Automated via SRA

Use storage vendor’s configuration tool to perform this step

0 minute

10 minutes

Grant the ESX Server hosts in Site B access the last good snapshot that was taken

Automated via SRA

Use storage vendor’s configuration tool to perform this step

0 minute

10 minutes

Network Mapping

Automated

Edit the VMX file of each virtual machine and map it to the correct network

0 minute

2 minutes for editing each of the 100 virtual machines = 200 minutes

Resource Pool Mapping

Automated

For each VM, need to add and tell the VI Client which cluster, folder and resource pool to use

0 minute

2 minutes for adding each of the 100 virtual machines = 200 minutes

VM Folder Mapping

Automated

Done in the above step

0 minute

N/A

To perform all the steps listed in the table above in a protected environment of 100 virtual machines, the time estimates for SRM assisted operations and manual operations are 355 minutes and 1915 minutes respectively. The ratio is around 1 to 5. In other words, you can expect to spend 5 times as much time to perform failback if you opt for the manual operations instead of the SRM-assisted operations. This is a significant time saving, on top of the time saved on fixing human errors and the audit trails that you get with SRM.

If you still prefer to do it the manual way, you can leverage tools and scripts to make it more streamlined.  Refer to the documents below for more information:

·         Chapter 7 of VMBook titled “A Practical Guide to Business Continuity & Disaster Recovery with VMware Infrastructure”: http://www.vmware.com/files/pdf/practical_guide_bcdr_vmb.pdf. This chapter provides insights on failover and failback. You will learn more about failback considerations after reading it.

·         http://www.rtfm-ed.co.uk/docs/vmwdocs/Chapter%2012_Site_Recovery_without_VMware_SRM.pdf provides insights and technical details on how to use PowerShell to automate the failback process. Even the author, Mike Laverick, of this document recommends that you use SRM and perform the manual operations only as a contingency plan.