Technical

vSphere Replication Target Storage Consumption

vSphere Replication is an asynchronous, host-based replication feature that is included with vSphere Essentials Plus Kit and higher editions. It can be used as a standalone solution for simple, storage-agnostic, cost-effective virtual machine replication. vSphere Replication also serves as a replication component for VMware vCenter Site Recovery Manager (SRM) and VMware vCloud Air Disaster Recovery. When replication is configured for a powered on virtual machine, vSphere Replication starts replicating the files that make up the virtual machine from the source location to the target location. A question that comes up sometimes is “How much storage will be consumed by the virtual machine at the target location?” As with many questions like this, the short answer is “It depends.”   🙂

What does this answer depend on? Several factors, actually. The first to consider is the configured capacity of the virtual disks (VMDK files) that make up the virtual machine. For example, a 100GB virtual disk that is thick provisioned will consume 100GB of storage capacity regardless of how much actual data is contained in the virtual disk. However, a nice feature in vSphere Replication is the option to have thin-provisioned virtual disks at the target location even if the source virtual disks are thick provisioned.

vr_thin_provision

Thin-provisioned virtual disks initially consume storage capacity equal to the amount of actual data on the virtual disk. For example, a virtual machine is deployed with a 100GB thin provisioned virtual disk. An operating system (OS), application, and data are installed, which amounts to 50GB of data. That means the virtual disk consumes 50GB of storage capacity initially, not 100GB. However, as changes are made to this virtual disk – e.g. OS patches, application upgrades, new or changed data, etc. – the virtual disk will grow in size accordingly.  It is possible for the thin-provisioned virtual disk to grow to a size equal to the configured capacity. That means our 100GB example virtual disk can grow to consume up to 100GB of storage capacity even though it is thin provisioned. Calculating when and how often a thin-provisioned virtual disk will grow is difficult as it depends mainly on whether a block with existing data is updated or a new block is written to every time the operating system writes to the file system. The figure below shows another example – a couple of smaller virtual machines configured with a 17GB virtual disks, but each one is consuming less than 7GB of actual storage capacity since they are thin provisioned.

used_provisioned

When replication is first configured, vSphere Replication performs a full sync – it sends all of the data that makes up the virtual machine to the target location to create the base disk of the replica. After the initial full sync, only changed data is replicated – this process is typically called a delta sync. While a delta sync is in progress, the replicated data is stored in one or more redo logs at the target location. Redo logs are used to preserve the integrity of the replica. Once replication is complete, a new redo log is created for the next replication cycle. The old redo log is consolidated into the base disk (or in some cases, another redo log if multiple point in time recovery is enabled – more on that shortly).

redo_log

Naturally, these redo logs consume storage capacity. How much capacity depends on how much data is replicated and how often replication occurs. A lower RPO means more frequent replication. For example, a virtual machine disk with 100GB of data and a daily data change rate of 6% means 6GB of data would be replicated each day. If replication occurs only once per day (24-hour RPO), the redo log would grow to 6GB as replication is occurring. If the data change rate is consistent throughout the day and replication occurs six times per day (4-hour RPO), the redo log would grow to approximately 1GB each time replication occurs.

vSphere Replication offers the option to retain multiple recovery points. For example, you can configure vSphere Replication to keep 3 recovery points per day for the past 5 days – a total of 15 recovery points. We will save the topic of how the multiple point in time (MPIT) engine determines, which points in time to keep for another day/article. This article is focused on storage considerations. Each of these recovery points consists of a few files including one or more redo logs, which are retained until the recovery point is expired by the retention policy. The example below shows MPIT configured to keep 24 recovery points for one day, which is why the modified dates and times are an hour apart.

mpit_files

As you can see, retaining these multiple recovery points utilizes varying amounts of storage capacity depending on how much data changed between each recovery point.

One more thing to be aware of: When testing a recovery plan with SRM and vSphere Replication, additional space will be consumed by each recovered virtual machine. Normally, redo logs are consolidated into the replica base disk or into other redo logs if MPIT is enabled. During an SRM test recovery, some or all of the redo logs may be in use until the test recovery is cleaned up (completed). If redo logs are in use, vSphere Replication cannot consolidate the redo logs. Replication continues during an SRM test recovery, which generates additional redo logs. Again, the amount of storage capacity consumed depends on factors such as data change rates, replication frequencies, and how long the SRM test recovery lasts.

Generally speaking, I very rarely hear from customers that vSphere Replication consumes too much storage. It is fairly efficient at minimizing the amount of storage required at the target location. With the items above in mind, here are a few recommendations to properly size storage capacity and minimize storage consumption at the target location:

  1. Specify thin provisioned disks when configuring replication.
  2. Try to estimate the maximum amount of data that will change between replication cycles. For example, virtual machine replication is configured with a 4-hour RPO. Very little data (less than 1GB) changes throughout the day. However, an ETL job runs before the last replication cycle of the day. This ETL job changes 50GB of data. That means the redo log for the last replication cycle will grow to 50GB before it is committed to the replica. Make sure there is enough storage capacity at the target location to accommodate the fluctuation in redo log sizes. One tool that may be useful in determining data change rate amounts and patterns in virtual machines is vSphere Replication Capacity Planning Appliance.
  3. If MPIT recovery for virtual machines is enabled, set the number of recovery points as low as possible while still meeting business requirements. Example: Business policy requires 2 recovery points per day for 7 days = 14 recovery points. vSphere Replication supports up to 24 recovery points. Do not configure additional recovery points just because you can – keep it to the required 14 recovery points to minimize the additional storage required for MPIT recovery.
  4. When running an SRM test, consider how long the test recovery will last. If it is only for a few hours, a smaller amount of additional capacity will be needed. If the test will last a few days, more capacity will naturally be consumed.
  5. It is difficult to provide precise numbers on the items above mainly because data change rates vary between applications and environments. Set vCenter Server alarms to monitor datastore utilization where vSphere Replication replicas are contained. Configure alarms to alert administrators if free space gets too low in these datastores.

@jhuntervmware