Home > Blogs > VMware vSphere Blog


How Does vSphere Replication Work?

With SRM 5 we introduced a new alternative for replication of virtual machines called "vSphere Replication" or "VR" for short.  There has been some excellent conversation about VR generated by presentations at VMworld and the release of SRM on the 15th of September.

We've also received a lot of questions about the details of VR, and thought this would be an excellent venue and opportunity to give you some more detail on how it actually works behind the scenes to protect your VMs.

What is vSphere Replication?

It is an engine that provides replication of virtual machine disk files that tracks changes to VMs and ensures that blocks that differ within a specified recovery point objective are replicated to a remote site.

How does VR work?

Fundamentally, VR is designed to continually track I/O destined for a VMDK file and keep track of what blocks are being changed.  There is a user-configured Recovery Point Objective for every VMDK, and the job of VR is to ensure that the blocks that change are copied across the network to the remote site at a rate sufficient to keep the replica in synch with the primary in accordance with the configured RPO.

If VR is successful in doing so, the replica at the remote site will be able to be recovered as part of a recovery plan within SRM.

How do you configure VR?

This is not very difficult at all!  There are a few places you can configure VR for a VM or set of VMs, either from within SRM or even by directly editing the properties of the VM from within the vSphere Client.

For example, you can right-click on a VM and select "vSphere Replication" as one of the popup menu items:

VR-for-VM

Once you select VR properties you can choose an RPO, a source VMDK, target folder, or even a pre seeded copy of the VM at the remote site to act as the replica!

Rep-Settings

 

How VR determines what is different and what needs to be replicated

There are two forms of synchronization that VR will use to keep systems synchronized.  When VR is first configured for a virtual machine you can choose a primary disk file or set of disk files and a remote target location to hold the replica.  This can be an empty folder, or it can be a copy of the VMDK that has the same UUID as the primary protected system.

The first thing VR will do when synchronizing is read the entire disk of both the protected and recovery site and generate a checksum for each block.  It then compares the checksum mapping between the two disk files and thereby creates an initial block bundle that needs to be replicated on the first pass to bring the block checksums into alignment.  This happens on port 31031.

This is called a "full synch" and only happens very rarely: Usually just on the first pass when the VM is configured for VR, but can also happen occasionally during other situations such as when recovering from a crash.

The ongoing replication is by use of an agent and vSCSI filter that reside within the kernel of an ESXi 5.0 host that tracks the I/O and keeps a bitmap in memory of changed blocks and backs this with a "persistent state file" (.psf) in the home directory of the VM.  The psf file contains only pointers to the changed blocks.  When it is time to replicate the changes to the remote site, as dictated by the RPO set for the vmdk, a bundle is created with the blocks that are changed and this is sent to the remote site for committing to disk.  This replication happens on port 44046.

How is the schedule for replication determined?  Can I create my own schedules?

You can not create your own schedules for replication, because there is a lot of intelligence built into the algorithm used by VR to ship blocks.  

Based on the RPO that acts as the outside window for replication, VR will attempt to send blocks using some dynamic computation to figure out how aggressively it needs to send data.

If, for example, the RPO is set for 1 hour and there is a very small historical change rate to blocks, VR does not need to act aggressively.  We take into account the last 15 transfers to the remote site to calculate on average how much data is likely to be shipped in the current bundle.  If the data took on average for example 10 minutes to ship and commit we estimate that we will not need more than 10 minutes for the next set of data and can schedule a start time to initiate the next transfer some time below 49 minutes to stay within the 1 hour RPO. 

If, however, the RPO is set to 1 hour and we historically are taking 35 minutes to ship and commit, then we know that eventually we will exceed our RPO as that extra 5 minutes beyond the half-way point will eventually catch up to our RPO even if we start shipping blocks immediately on completion of the previous bundle!

So the point is that VR takes all of these factors into account and will set its own schedule to ship changed block bundles, depending on a number of factors such as how large the transfer size is, how much change is taking place, how long it has taken in the past to ship, and so forth, and will adjust or set alerts accordingly.

How data gets transferred and how it gets written

Because the VR agent works with a passive filter that tracks changes, all we worry about is changed blocks, not the format of the disk or file system or anything.  

At the recovery site you will need to deploy a virtual appliance called the "vSphere Replication Server" (VRS) that acts as the target for the VR agent.  The VRS receives the blocks from the agents at the protected site and waits until the bundle is completely received and consistent, then passes it off to the ESXi's network file copy (NFC) service to write the blocks to its target storage that we specified when we configured VR for the protected VM.  The result is that the entire process is abstracted from the storage until the blocks are given to the NFC and that means we can mix and match storage: We can have thick or thin provisioned VMDKs on either site, and use any type of storage we choose at either site.  The NFC of the host the VRS interacts with just writes to a VMDK.  In essence, the VRS receives the block transfer, the NFC writes it out.  It's important to note that the traffic from the VR agent is sent across the vmkernel management NICs of your ESXi hosts, so be aware you will see a lot more traffic on those switches.

Hopefully this gives you a little more insight into how vSphere Replication works.  If you've got questions or want more detail, please leave a note in the comments!  If you think vSphere Replication is great or 'not' please let me know that as well, and let's talk about why you think what you do.

We've got high hopes that VR will give our smaller customers a new capability to approach DR, and our larger customers the ability to tier out their replication offerings.

What do you think?  Is vSphere Replication a good fit for your environment?

-Ken

 

 

 

 

31 thoughts on “How Does vSphere Replication Work?

  1. Ian Campbell

    Do you have any comments on the scalability of vSphere Replication? Can it totally replace storage based replication?

  2. Ken Werneburg

    Great question Ian! The biggest scalability limit, and the one that really acts as the limiting factor when compared with storage-based replication, is that the maximum RPO for VR is 15 minutes.
    The maximum number of VR protected VMs per instance is 500, but that’s not as commonly an issue as the 15 minute RPO. Obviously this is a pretty key limitation, and gives pretty good guidance where array-based replication makes sense vs vSphere Replication.

  3. Michael White

    Hi Ian,
    Another thing I think that is important is that most of the array providers out there have agents that can provide application consistency in replication. VR has none of that. But, in many workloads I do think VR can stand on its own quite well!
    Michael

  4. Adrian James

    Ken, I’ve seen no mention of using array based replication and VR at the same time, is this a supported configuration? Eg. I am quite happy with array replication from head office to DR site, but I would like to use VR at some branch offices. Can you place the VR appliance at multiple branch locations if they are all part of the same vCenter instance?
    Thanks

  5. Ken Werneburg

    Hi Adrian, you can absolutely mix VR and ABR together at the same time. There’s no concern there whatsoever.
    We pretty much assume that a standard model will be ABR between major sites and perhaps VR from remote offices to those main offices.
    Keep in mind however that scalability right now allows for 3:1 array-based replication site layout and 7:1 VR site layout.
    Moreover your comment about them all being part of the same vCenter instance makes me curious.
    Do you mean all your sites are being managed by one VC? You’ll definitely need multiple VCs…

  6. Adrian James

    Sorry, I meant all the branch offices and head office are under one vCenter, and the DR site under another vCenter instance.

  7. Russ Hargrove

    nice article, our situation is slightly different, we don’t have a secondary site for replication what we do have is two remote sites with a single ESXi host server each and a handful of VMs… can we use SRM5 to replicate those VMs to the ESXi clusters we have at our central datacenter?

  8. Will Lin

    Ken,
    If VR uses the vmkernel management interface of the ESXi host for the replication traffic, does this change our thinking regarding network design on an ESXi host? For example, do we now need to consider creating a dedicated vmk interface dedicated to VR traffic as a best practice? Can we tell VR to use a specific vmk interface when several are present?

  9. Ken Werneburg

    Adrian, interesting scenario. Each VRMS is attached to a single VC and therefore it should work – the network routing might become interesting as you’d need to have each ESXi hosts’s kernel traffic able to route to the DR site as the traffic is sent directly to the VR Server at the DR site. I suspect that would be your limiting factor.
    It’s *designed* to have VC pairs built for each site but I suspect your scenario would work as long as DNS and routing worked, but I can’t give definitive guidance on that.

  10. Ken Werneburg

    Russ,
    You could do that indeed, presuming there are VCs managing each site.
    Then you would create a pair between Site A&C and another pari between Site B&C. It’s a well documented scenario for doing “N:1″ failovers, that hasn’t changed with 5.0 except that you can now do the same with VR instead of array replication.
    Check out: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1014640
    That’s the old 4.1 method for installing SRM in an N:1 model.

  11. Ken Werneburg

    Will,
    Yes it might well cause you to rethink the architecture to take into account VR traffic if you are going to be using it heavily.
    You can set up dedicated VMK interfaces on the ESXI hosts to use VR but the only way to tell it to use them is by routing tables: i.e. send all traffic destined for the VRS at the recovery site over that interface.
    I’d suggest you’d need to be driving a lot of traffic to worry about this, but it may well be worth investigating indeed!

  12. Owen

    Thanks for the post! This product is looking better and better. I hope we can drop our third party solutions in favor of an integrated one!

  13. Ronny

    Are there any plans to support application-level consistency (e.g. VSS integration) for VR? Does VR have something like consistency groups for multiple VM’s?

    1. Ken

      Yes, it will work, but keep in mind that the iSCSI volume that is attached will all be network traffic, not disk, so we are not going to replicate that data. Only if it’s being written to a vmdk will we track and replicate it.

  14. Douglas Hanley

    Hi Ken,

    Thanks for this post. Is it possible to track the actual recovery point that has been achieved for a particular VM relative to the RPO itself? I would like to be able to see where my WAN bandwidth has prevented a particular VM from being protected to the levels I specified. If this is possible can I do both within SRM and through the VI web services?

    Thanks,

    Douglas.

    1. Ken

      You can see high level information in the SRM interface about the last size/time/duration of the last sync, and also in the VR status in the web UI. For historical detail like that you can go to the “Events” tab for any given VM and look at the historical data for the past replications.

  15. Chris

    Replication is nice and works on our Sites pretty good. But I can’t find any hint, how I make the Backup site the online site. The replicated virtual machines are on the destination Disk now (we have two sites, but only one VCenter).

    Is there the need of Recovery Site Manager to get the replicated machines working? In this case I’ll use my third pary application, which is allready payed…
    So what will I do when my productive site goes down?

  16. Tim Brown

    I have VR configured and I am noticing a lot of redo logs in the DR site, which are taking up more space than the original at the live site for some reason. Can I merge these somehow at the DR site?

  17. Neil McKay

    The article certainly clarifies in my mind how the process works except for one area.

    Does the host on which the VRS resides perform all writes to the shared storage where the replicated vmdk’s reside or is this spread out somehow between any or all of the hosts in the cluster that have access to those datastores?

  18. Tim Brown

    The DR environment consists of a single host so the VRS resides on that host which has access to all the datastores.

  19. Miguel V.

    Hello, what happens if you add and configure replication for, say, 1tb thin provisioned disk on the source VM while it is replicating? Would it replicate the full 1TB of empty data?

  20. Geraldo Crisostomo

    I have one question. Which “path” the vSphere Replication works? Is it Ethernet copper or through Fiber? Or both?

  21. Allen Crowe

    Hi Ken,
    Thanks for the very informative post!

    Can you comment further on the scheduler algorith?

    We have a very slow link between sites that is used heavily during business hours.
    With the RPO set to 24hrs will the scheduler be smart enough to run replications at night when there is less traffic, or at least lower it’s bandwidth enough so as to not flood the pipe?

  22. advantages of discount coupons

    Greetings! I know this is somewhat off topic but I was wondering which blog platform are
    you using for this website? I’m getting tired of WordPress because I’ve had problems with hackers and I’m looking at alternatives for another platform. I would be awesome if you could point me in the direction of a good platform.

  23. Bharath Reddy

    Hi Ken,

    Article is very interesting. Thanks for lot of useful info made available. Here is our concern related to Vsphere based replication. We are replicating 17 vms from primary site to recovery site with 2 VR provisioned in each site. The traffic that is being utilised for this replication is very huge and it is in GB’s even for a small vm which is having less size of vmdk. Can you please help us on this.

  24. Santhosh

    Thanks Ken for this post. This question is on the information you mentioned about the “initial sync”. This creates the checksum and copy the blocks. what if the block changes (checksum also changed) with a write operation before the initial sync completes? Is VR using a snapshot to freeze the disk for the initial sync ?

Comments are closed.