We often get requests on how to estimate vSphere Replication network bandwidth utilization. This can be rather difficult as there are a few variable factors that influence how much traffic is generated by vSphere Replication (VR). A couple of key items are data change rate in the virtual machine (VM) and the Recovery Point Objective (RPO) setting in VR for the VM. Data change rate can be difficult to determine and is rarely a constant number. Fortunately, one of the engineers here at VMware built a virtual appliance that calculates and graphs the the amount of replicated data generated by a VM and the bandwidth that would be consumed when using VR for replicating this VM.
Before we get into the details of the vSphere Replication Capacity Planning Appliance, let’s discuss RPO for a moment. The RPO in VR can be set to anywhere from 15 minutes to 24 hours on a per-VM basis. RPO is selected when you configure (or reconfigure) replication for a VM.
A lower RPO will generate more replication traffic. I recently had someone ask me how that is true. This person basically said “It should not matter what the RPO is set to – you are still replicating the same amount of data whether you send it once every 24 hours or send smaller chunks every hour, right?” That is not true as VR sends only the most recent version of the data. As a straight-forward example, consider a block of data that changes once per hour. If RPO was set to one hour, that block would basically be replicated 24 times throughout the day. I say “basically” because the VR algorithm is more complex than that – I am keeping it simple to illustrate a point. If RPO was set to 24 hours, only the most recent version of the block would be replicated once per day. Let’s extrapolate: For continued simplicity, let’s say that block of data is 1MB. That means the one hour RPO above would require VR to replicate a total of 24MB in a day versus 1MB if the RPO is set to 24 hours…
One hour RPO: 1MB once per hour x 24 hours = 24MB in one day
24 hour RPO: 1MB once per day = 1MB in one day
Again, this is a basic illustration to simply demonstrate how the RPO setting can have a considerable impact on the amount of network traffic generated by VR. Key takeaway: To minimize network bandwidth consumption, determine the RPO requirement for the VM and set the RPO in VR to that requirement. For example, if the requirement is a 4 hour RPO, set the VR RPO policy to 4 hours. Don’t set the RPO in VR to 15 minutes just because you can.
Now that we have the RPO discussion out of the way, let’s focus on the original topic: The vSphere Replication Capacity Planning Appliance. The appliance is deployed from an OVF package as a preconfigured Linux-based virtual machine with 2 virtual CPUs and 4GB of memory. It is a purpose-built appliance similar to VR that does not write the “replicated” data to disk. It should not be installed in the same vCenter Server environment with VR. In other words, if you have VR already deployed, do not deploy the capacity planning appliance in the same environment as it may interfere with VR.
Once the appliance is deployed, use a VM console connection or SSH to log into the appliance and configure replication for a VM:
cd /opt/vmware/hbrtraffic/bin/
./configureReplication --
vc=10.12.106.126 --
vcuser=administrator --
vcpass=password --
lwd=10.144.107.14 --
vmname=dbserver01 --
rpo=240
Most of the options in the line above are self-explanatory. Note that --
lwd is IP address of the capacity planning appliance and --
rpo is measured in minutes (4 hours x 60 minutes = 240, in my example above). You can view the complete list of options by entering ./configureReplication --
help
Here are a few frequently asked questions (FAQs) that typically come up at this point:
How do I disable capacity planning for a VM? Use the --
remove option. Example: ./configureReplication --
vc=10.12.106.126 --
vcuser=administrator --
vcpass=password --
lwd=10.144.107.14 --
vmname=dbserver01 --
remove
Can I change the RPO? Yes, but it is a two step process: You must remove replication by using the --
remove option and then re-enable replication using the new RPO.
Can I enable capacity planning for more than one VM? Yes.
Once capacity planning has been enabled for at least one VM, statistics can be viewed using a web browser. It takes 10-15 minutes before information begins to appear. To view this information, use the following URL:
https://<IP address of capacity planning appliance>:5480/vr-graphs/
You will see a list of the VMs you configured.
Clicking on one of the VM names will produce the graphs for that VM. You will see two columns: LWD Network Traffic and Delta Size. LWD stands for Light Weight Delta. A Light Weight Delta is basically a “package” containing the data that has changed. When VR is first enabled, all of the data that makes up a VM must be replicated (or seeded) to the target location. After that, only changed data – Light Weight Deltas – are replicated to the target site.
LWD Network Traffic and Delta Size metrics are available across four given time periods: the most recent 4 hours, 1 day, 1 week, and 1 month. Delta Size charts show the amount of data that would have been transmitted by VR. You can also see the average delta size. When reviewing LWD Network Traffic, use the “In LWD” metric as this represents the amount of replicated data that would be received by a VR server and written to the target location. In addition to the LWD Network Traffic charts, the average and maximum bandwidth utilization is available along with the total amount of replicated data.
Ideally, you would set up the capacity planning appliance (before setting up VR), enable it for the VMs you plan to replicate, and then let it run for at least a couple of weeks. That would provide useful averages and show peaks and patterns over longer amounts of time. After capacity planning is complete, you would then deploy VR using the same RPO settings with a good understanding of how these settings will impact network utilization.
Last, but not least, I will point out this tool is a VMware Labs “fling”. It was developed by one of the engineers that works on vSphere Replication, but it is not supported. When you download the software, you must read and agree to the Technical Preview Agreement. Also understand that Flings are experimental and should not be run on production systems. Click here to find out more and to download the vSphere Replication Capacity Planning Appliance.
@jhuntervmware