By Joanna Guan and Davide Bergamasco
In a prior post we assessed the performance of VMware vSphere Content Library, focusing on the instantiation of a virtual machine from an existing library. We considered various scenarios and provided virtual infrastructure administrators with some guidelines about the selection of the most efficient storage backing based on a cost/performance trade-off. In this post we focus on another Content Library operation, namely Synchronization, with the goal of providing similar guidelines. We also cover two Content Library maintenance operations, Import and Export.
Once a library is created and published, its content can be shared between different vCenter Servers. This is achieved through the synchronization operation, which clones a published library by downloading all the content to a subscribed library. Multiple scenarios exist based on the vCenter Server connectivity and backing storage configurations. Each scenario will be discussed in detail in the following sections after a brief description of the experimental testbed.
For our experiments we used a total of four ESXi hosts: two for the vCenter Server appliances (one for the published library and a second for the subscribed library) and two to provide the datastore backing for the libraries. These two hosts are separately managed by each vCenter server. The following table summarizes the hardware and software specifications of the test bed.
|ESXi Hosts running vCenter Server Appliance|
|Dell PowerEdge R910 server|
|CPUs||Four6-core Intel® Xeon® E7530 @1.87 GHz, Hyper-Threading enabled.|
|Virtualization Platform||VMware vSphere 6.0. (RTM build # 2494585)|
|VM Configuration||VMware vCenter Sever Appliance 6.0 (RTM build # 2559277)/16 vCPU and 32GB RAM|
|ESXi Hosts Providing Datatastore Backing|
|Dell PowerEdge R610 server|
|CPUs||Two 4-core Intel® Xeon® E5530 @2.40 GHz, Hyper-Threading enabled.|
|Virtualization Platform||VMware vSphere 6.0 (RTM build # 2494585)|
|Storage Adapter||QLogic ISP2532 DualPort 8Gb Fibre Channel HBA|
|Network Adapters||QLogicNetXtreme II BCM5709 1000Base-T (Data Rate: 1Gbps)Intel Corporation 82599EB 10-Gigabit SFI/SFP+ (Data Rate:10Gbps)|
|EMC VNX5700 Storage Array exposing two 20-disk RAID-5 LUNS with a capacity of 12TB each|
Single-item Library Synchronization
When multiple vCenter Servers are part of the same SSO domain they can be managed as a single entity. This feature is called Enhanced Linked Mode (see this post for a discussion on how to configure Enhanced Linked Mode). In an environment where Enhanced Linked Mode is available, the contents of a published library residing under one vCenter Server can be synched to a subscribed library residing under another vCenter server by directly copying the files from the source datastore to the destination datastore (this is possible provided that the ESXi hosts connected to those datastores have direct network connectivity).
When Enhanced Linked Mode is not available, the contents of a published library will have to be streamed through the Content Library Transfer Service components residing on each vCenter server (see the prior post in this series for a brief description of the Content Library architecture). In this case, three sub-scenarios exist based on the storage configuration: (a) both published and subscribed library reside on a datastore, (b) both published and subscribed library reside on an NFS file system mounted on the vCenter servers, and (c) the published library resides on an NFS file system while the subscribed library resides on a datastore. The four scenarios discussed above are depicted in Figure 1.
Figure 1. Library synchronization experimental scenarios and related data flows.
For each of the four scenarios we synchronized the contents of a published library to a subscribed library and measured the completion time of this operation. The published library contained only one item, a 5.4GB OVF template containing a Red Hat virtual machine in compressed format (the uncompressed size is 15GB). The following table summarizes the four experiments.
|Experiment 1||The published and subscribed libraries reside under different vCenter Servers with Enhanced Linked Mode; both libraries are backed by datastores.|
|Experiment 2||The published and subscribed libraries reside under different vCenter Servers without Enhanced Linked Mode; both libraries are backed by datastores.|
|Experiment 3||The published and subscribed libraries reside under different vCenter Servers without Enhanced Linked Mode; both libraries are backed by NFS file systems, one mounted on each vCenter server.|
|Experiment 4||The published and subscribed libraries reside under different vCenter Servers without Enhanced Linked Mode; the published library is backed by an NFS file system while the subscribed library is backed by a datastore.|
For all the experiments above we used both 1GbE and 10GbE network connections to study the effect of network capacity on synchronization performance. In the scenarios of Experiment 2 through 4 the Transfer Service is used to stream data from the published library to the subscribed library. This service leverages a component in vCenter Server called rhttpproxy whose purpose is to offload the encryption/decryption of SSL traffic from the vCenter Server web server (see Figure 2). To study the performance impact of data encryption/decryption in those scenarios, we ran the experiments twice, the first time with rhttpproxy enabled (default case), and the second time with rhttpproxy disabled (thus reducing security by transferring the content “in the clear”).
Figure 2. Reverse HTTP Proxy.
The results of the experiments outlined above are shown in Figure 3 and summarized in the table below.
|Experiment 1||The datastore to datastore with Enhanced Linked Mode scenario is the fastest of the four, with a sync completion time of 105 seconds (1.75 minutes). This is because the data path is the shortest (the two ESXi hosts are directly connected) and there is no data compression/decompression overhead because content on datastores is stored in uncompressed format. When a 10GbE network is used, the library sync completion time is significantly shorter (63 seconds). This suggests that the 1GbE connection between the two hosts is a bottleneck for this scenario.|
|Experiment 2||The datastore to datastore without Enhanced Linked Mode scenario is the slowest, with a sync completion time of 691 seconds (more than 11 minutes). This is because the content needs to be streamed via the Transfer Service between the two sites, and it also incurs the data compression and decompression overhead across the network link between the two vCenter servers. Using a 10GbE network in this scenario has no measurable effect since most of the overhead comes from data compression/decompression. Also, disabling rhtpproxy has a marginal effect for the same reason.|
|Experiment 3||The NFS file system to NFS file system scenario is the second fastest scenario with a sync completion time of 274 seconds (about 4.5 minutes). Although the transfer path has the same number of hops as the previous scenario, it does not incur the data compression and decompression overhead because the content is already stored in a compressed format on the mounted NFS file systems. Using a 10GbE network in this scenario leads to a substantial improvement in the completion time (more than halved). An even more significant improvement is achieved by disabling rhtpproxy. The combined effect of these two factors yields a 3.7x reduction in the synchronization completion time. These results imply that for this scenario both the 1GbE network and the use of HTTPS for data transfer are substantial performance bottlenecks.|
|Experiment 4||The NFS file system to datastore is the third fastest scenario with a sync completion time of 298 seconds (just under 5 minutes). In this scenario the Transfer Service at the subscribed vCenter needs to decompress the files (content on mounted NFS file systems is compressed), but the published vCenter does not need to re-compress them (content on datastores is stored uncompressed). Since data decompression has a substantially smaller overhead than compression, this scenario achieves a much better performance than Experiment 2. Using a 10GbE network and disabling rhttpproxy in this scenario has the same effects as in Experiment 3 (that is, a 3.7x reduction in completion time).|
Figure 3. Library synchronization completion times.
The above experiments clearly show that there are a number of factors affecting library synchronization performance:
- Type of data path: direct connection vs. streaming;
- Network capacity;
- Data compression/decompression;
- Data encryption/decryption.
The following recommendations translate these observations into a set of actionable steps to help vSphere administrators optimize Content Library synchronization performance.
- The best performance can be obtained if Enhanced Linked Mode is available and both the published and subscribed libraries are backed by datastores.
- When Enhanced Linked Mode is not available, avoid datastore-to-datastore synchronization. If no other optimization is possible, place the published library on an NFS file system (notice that for best deployment performance the subscribed library/libraries should be backed by a datastore as discussed in the prior post).
- Using a 10GbE network is always beneficial for synchronization performance (except in the datastore-to-datastore synchronization without Enhanced Linked Mode).
- If data confidentiality is not required, the overhead of the HTTPS transport can be avoided by disabling rhttpproxy as described in VMware Knowledge Base article KB2112692.
Concurrent Library Synchronization
To assess the performance of a concurrent library synchronization operation where multiple items are copied in parallel, we devised an experiment where a subscribed library is synchronized with a published library that contains an increasing number of items from 1 to 10. The source and destination vCenter servers support Enhanced Linked Mode and the two libraries are backed by datastores. Each item is an OVF template containing a Windows virtual machine with a 41GB flat VMDK file. Each vCenter server manages one cluster with two ESXi hosts, as shown in Figure 4.
Figure 4. Concurrent library synchronization.
We studied two scenarios depending on the network speed. With a 1GbE network, we observed that each file transfer always saturates the network bandwidth between the two ESXi hosts, as we expected. Because each site has two ESXi hosts per vCenter, the library synchronization can use two pairs of ESXi hosts for transferring two files concurrently. As shown by the blue line in Figure 5, the library synchronization completion time is the virtually the same for one or two items, suggesting that two items are effectively transferred concurrently. When the number of library items is larger than two, the completion time increases linearly, indicating that the extra file transfers are queued while the network is busy with prior transfers.
With a 10GbE network we observed a different behavior. The synchronization operations were faster than in the prior experiment, but the network bandwidth was not completely saturated. This is because at a higher transfer rate the bottleneck was our storage subsystem. This bottleneck became more pronounced as more and more items were synchronized concurrently due to a more random access pattern on the disk subsystem. This resulted in a super-linear curve (red line in Figure 5), which should eventually become linear should the network bandwidth become eventually saturated.
The conclusion is that, with a 1GbE network, adding more network interface cards to the ESXi hosts to increase the number of available transfer channels (or alternatively adding more ESXi hosts to each site) will increase the total file transfer throughput and consequently decrease the synchronization completion time. Notice that this approach works only if there is constant bi-sectional bandwidth between the two sites. Any networking bottleneck between them, like a slower WAN link, will limit, if not defeat, the transfer concurrency.
With a 10GbE network, unless very capable storage subsystems are available both at the published and subscribed library sites, the network capacity should be sufficient to accommodate a large number of concurrent transfers.
Figure 5. Concurrent synchronization completion times.
Library Import and Export
The Content Library Import function allows administrators to upload content from a local system or web server to a content library. This function is used to populate a new library or add content to an existing one. The symmetrical Export function allows administrators to download content from a library to a local system. This function can be used to update content in a library by downloading it, modifying it, and eventually importing it again to the same library.
As for the prior experiments, we studied a few scenarios using different library storage backing and network connectivity configurations to find out which one is the most performant from the completion time perspective. In our experiments we focused on the import/export of a virtual machine template in OVF format with a size of 5.4GB (the VMDK file size is 15GB in uncompressed format). As we did earlier, we assessed the performance impact of the rhttpproxy component by running experiments with and without it.
We consider the six scenarios summarized in the following table and illustrated in Figure 6.
|Experiment 5||Exporting content from a library backed by a datastore. The OVF template is stored uncompressed, using 15GB of space.|
|Experiment 6||Exporting content from a library backed by an NFS filesystem mounted on the vCenter Server. The OVF template is stored compressed using 5.4GB of space.|
|Experiment 7/9||Importing content into a library backed by a datastore. The OVF template is stored either on a Windows system running the upload client (Experiment 7) or on a Web Server (Experiment 9). In both cases the data is stored in compressed formant using 5.4GB of space.|
|Experiment 8/10||Importing content into a library backed by an NFS filesystem mounted on the vCenter Server. The OVF template is stored either on a Windows system running the upload client (Experiment 9) or on a Web Server (Experiment 10). In both cases the data is stored in compressed format using 5.4GB of space.|
Figure 6. Import/Export storage configurations and data flows.
Figure 7 shows the results of the six experiments described above in terms of Import/Export completion time (lower is better), while the following table summarizes the main observations for each experiment.
|Experiment 5||This is the most unfavorable scenario for content export because the data goes through the ESXi host and the vCenter Server, where it gets compressed before being sent to the download client. Using a 10GbE network or disabling rhttpproxy doesn’t help very much because, as we have already observed, data compression is the largest performance limiter.|
|Experiment 6||Exporting a library item from an NFS filesystem is instead the most favorable scenario. The data is already in compressed format on the NFS filesystem, so no compression is required during the download. Disabling rhttpproxy also has a large impact on the data transfer speed, yielding an improvement of about 44%. Using a 10GbE network, however, does not result in additional improvements because, after removing the encryption/decryption bottleneck, we face another limiter, a checksum operation. In fact, in order to ensure data integrity during the transfer, a checksum is computed on the data as it goes through the Transfer Service. This is another CPU-heavy operation, albeit somewhat lighter than data compression and encryption.|
|Experiment 7/9||Importing content into a library backed by a datastore from an upload client (Experiment 7) is clearly limited by the network capacity when 1GbEps connections are used. In fact, the completion time virtually does not change when rhtpproxy is disabled. Performance improves when 10GbE connections are employed, and further improvements are observed when rhtpproxy is disabled. This suggest that data encryption/decryption is definitely a bottleneck with the larger network capacity.When a web server is used to host the library item to be imported (Experiment 9) we observe a completion time which is more than halved compared to Experiment 7. There are two reasons for this: (1) the Transfer Service bypasses rhtpproxy when importing content from a web server (this is the reason there are no “rhtpproxy disabled” data points for experiments 9 and 10 in Figure 7), and (2) the web server is more efficient at transferring data than the Windows client VM. Using a 10GbE connection results in a further improvement. Given that import performance improves by only 10%, this indicates the presence of another limiter. This limiter is the decompression of the library item while it is being streamed to the destination datastore.|
|Experiment 8/10||When content is being imported into a library backed by an NFS filesystem we see a pattern very similar to the one we observed in Experiments 7 and 9. The only difference is that the completion times in Experiments 8 and 10 are slightly better because in this case there is no decompression being performed as the data is stored in compressed format on the NFS filesystem. The only exception is the “10GbE Network” data point in experiment 10, which is about 44% better than in Experiment 9. This is because when all the limiters have been removed, data decompression plays a more significant role in the import performance.|
Figure 7. Import/Export completion times.
In this last experiment, we assess the performance of Content Library in terms of network throughput when multiple users simultaneously export (download) an OVF template with a VMDK file size of 5.4GB. The content is redirected to a NULL device on the download client in order to factor out a potential bottleneck in the client storage stack. The library backing is an NFS filesystem mounted on the vCenter server.
Figure 8 shows the aggregate export throughput (higher is better) as the number of concurrent export operations increases from 1 to 10 in four different scenarios depending on the network speed and use of rhtpproxy. When export traffic goes through the rhtpproxy component, the speed of the network seems to be irrelevant as we get exactly the same throughput (which saturates at around 90 MB/s) with both the 1 GbE and 10 GbE networks. This once again confirms that rhtpproxy, due to the CPU intensive SSL data encryption, creates a bottleneck on the data transfer path.
When rhttpproxy is turned off, the download throughput increases until the link capacity is completely saturated (about 120MB/s), at least with a 1GbE network. Once again, administrators can trade off security for performance by disabling rhtpproxy as explained earlier.
When a 10GbE network is used, however, throughput saturates at around 450 MB/s instead of climbing all the way up to 1200MB/s (the theoretical capacity of a 10 Gbps Ethernet link). This is because the data transfer path, when operating at higher rates, hits another bottleneck introduced by the checksum operation performed by the Transfer Agent to ensure data integrity. Generating a data checksum is another computationally intensive operation, even though not as heavy as data encryption.
Figure 8. Concurrent export throughput