Home > Blogs > VMware vSphere Blog > Monthly Archives: June 2012

Monthly Archives: June 2012

Advanced vSphere Replication Options for Single VM Replication Performance


Posted by
Ken Werneburg
Tech Marketing
Twitter @vmKen

Lots of people stand up a vSphere Replication based SRM instance and check to see “how fast” it can replicate a VM.  What I want to do today is talk about some of the factors that go into this, and give you some advice on using unsupported advanced features to get faster speed for replicating a single VMDK.

Why would I do this?  It’s unsupported!  You can saturate your links!  You can create a vortex that will cause implosion of the milky way into a singularity!  So the point is, you really *shouldn’t* do this.  Unfortunately though, people are doing incorrect assessments based on a misunderstanding of how VR works, and I want to give you some understanding and a potential way to do some more interesting testing of your own.

The first point to note is that there are two advanced settings that control in essence how much data is buffered before sending and how much is sent in parallel. These limits are in place to keep individual large replications from saturating a pipe and causing other replications to fail, and to protect the host from being saturated by nothing other than replication traffic.

The effect of this, when replicating a single VM, is that you see a smaller amount of data getting replicated, more slowly, than you think should be.  Fundamentally, vSphere Replication is optimized for parallel replication of multiple VMs instead of peak throughput for a single disk.

So let’s talk about the key advanced settings:

HBR.TransferDiskMaxBufferCount.  This is the maximum number of 8K blocks held in memory during replication, for each replicated disk.  The buffer is basically the total number of blocks from a transfer that we can operate on at one point.  If our delta has say 100 blocks that need to be shipped, we will only load this value’s worth of blocks at any given time on which replication is operating.  This defaults to 8.

HBR.TransferDiskMaxExtentCount.  This is the total number of outstanding parallel blocks that have been sent to the receiving VRS, but whose writes have not yet been acknowledged by the VRS and a reply sent back to the source host.  This defaults to 16.

So for example, if we have 20 blocks to send as part of a single light-weight delta for a single VMDK, we will take the first 8 blocks into the buffers and send them out.  This gives us 8 buffers in use, and 8 active extents “on-the-wire”.

As a block is sent out of the buffer, it is no longer needed therein, so the block buffer gets cleared and reused for the next block that needs to be shipped.  Note, however, that a block may not have completed the actual transfer and write at the target site yet, but because it has been sent the buffer can be cleared.  The 8 buffers are always getting filled and the blocks shipped, and those 8 buffers will load the next block as soon as it has finished shipping the previous one.

The extents, however, are incrementing each time a buffer is emptied by a block transfer, and they are only reduced once the target site VRS has replied back that it has received and written the block.  So there is a delay between the time a buffer is emptied and the time the extent for that replication of the block is cleared.

For example, we have 8 blocks of 20 in the buffer to be sent.  Let’s imagine blocks 1 through 4 are sent simultaneously and the buffer for each of those blocks is emptied. We now load the next 4 blocks into the buffers for sending, and ship them out too at the same time as the original blocks 5 through 8 complete.  At this point the original 8 blocks have been sent, plus another 4, so our extent count is now up to 12.  Since the original blocks 5 through 8 complete their send, we load the next blocks into the buffers and ship them.

That brings us to 16 blocks in transit at once, which is the maximum parallel extent count.  So the last blocks from the original LWD (16 through 20) may get loaded into the buffer, but will not get shipped until our active extents get committed.

As blocks are received and written out at the recovery site by the VRS, it sends back in essence an acknowledgement that can then clear the extents associated with that transfer.  Say the first 7 blocks it received get written at the recovery site, that will then free up 7 extents on the host that is sending, and the final 4 blocks queued in the buffer can now be sent.

So there is an interplay between the total number of active blocks that are shipped at any one time, as dictated by the buffer size, and the total number of active blocks that are shipped but not yet comitted which is the extent size.

Phew!  So far so good?

If we presume a single VMDK is being replicated, let’s see how much throughput we can expect.  The variables are:

  • Number of replications = n
  • Number of buffers = b (default is 16)
  • Buffer size = s (default is 8K)
  • Latency = l

The calculation is basically (n * b * s / l) * 0.8 = throughput Mbps

1 disk * 16 buffers * 8k per buffer / 100ms = 10.24Mbps.  Adjust your latency up and down as needed.  If replicating across the street with a <1ms latency, you could get, say, 1 Gbps.

This is top-end, theoretical, no overhead and no write-latency at the target site.  So let’s add in 100ms for distance latency, about 3 ms write latency at the target, and take ~20% for TCP/IP & congestion overhead, acknowledgement responses, etc…

Long distance : 1disk * 16 buffers * 8k per buffer / 103ms = ~10 Mbps * 0.8 = ~8 Mbps.

So… how do we speed it up?  By adding more disks!  Obviously if we start changing the number of disks we should expect, all other things being equal, a linear progression in terms of replication traffic.

3 disks * 16 buffers * 8k per buffer / 103ms = 30 Mbps * 0.8 = ~24 Mbps.

Very straightforward.   If you look back to the results shared by my friends at Hosting.com a few weeks back you’ll find that they had a very similar set of results, somewhere on the order of 11Mbps per VM on a link with 55ms latency.

1 disk * 16 buffers * 8k per buffer / 55ms = 18Mbps * 0.8 = ~14 Mbps.  Their results were not too far off the simplistic theoretical averages based on our algorithm.

3 disks * 16 buffers * 8k per buffer / 55 ms = 56Mbps * 0.8 = ~45 Mbps.  Again fairly close to what they saw in practice.

So back to the original point.  How do we try to make vSphere Replication “go faster”.  Well the easiest way is to add more VMs to the replication.  It doesn’t speed up the individual VM’s replication as that is capped by latency, number of buffers, and buffer size.  But it will give you more usage of your bandwidth.

If you don’t want to replicate more VMs, and just want to make one VM replicate faster?  Change the variables… let’s get more buffers per disk and make the extents larger!

Back to the caveat, however… This is *not supported* by VMware.  The variable can be changed, but this has not undergone our usual rigorous testing scenarios, so you may run into issues when you saturate the network, or saturate the VRS that receives the blocks, or take too much host memory… Do this with great reluctance!

So to make it go faster, we simply need to adjust the advanced settings listed above on each host from which you are replicating.  Remember vSphere Replication’s agent is in the kernel, so the settings are in the host advanced settings, under “HBR”, not in SRM’s advanced settings.  This will also require you to reboot your hosts after making the change.

Host-Advanced-HBR

Adjust them cautiously, and by a similar ratio.  If you double one, you should double the other, lest you get to a situation where for example you have lots of free extents but your buffers are saturated, or vice versa. A cautious approach might be to set your MaxBufferCount to 16 and your MaxExtentCount to 32 – this should give you a close to linear doubling of throughput.

Again the cautions here are that you could overuse memory on the host (for the buffers), you could saturate your VRS at the receiving side, you could do all sorts of unpleasant things, and this has not gone through testing so is completely unsupported. This will affect all replications on the host, so don’t do this in a production environment!  But in a lab test environment, with only one or two VMs being replicated it may be worth checking out.

Hopefully this helps you understand both a bit more about how vSphere Replication works, and how it is designed for parallel operation of multiple VMs, but can be tweaked to improve single VM replication performance in lab environments.

-Ken

**** EDIT ****

I missed a decimal in one of my calculations…

VMX Swap feature in vSphere 5.0

Something you might have missed…. A little known feature of vSphere 5.0 called VMX swap helps to reduce the memory footprint of the VMX process by up to 80%. The vSphere kernel has several internal processes, agents, and data structures that require physical memory. vSphere reserves physical memory on the host to assure that these internal vSphere processes always have the physical memory they need. Sometimes these reservations are more than the process is actually currently using. The VMX process manages data structures needed to bootstrap and support the guest and has a memory reservation of approximately 50MB but can vary depending on virtual machine size and configuration.

To allow for better utilization of physical host memory and higher consolidation levels, vSphere 5.0 introduced the ability to create a swap location for the VMX process for a virtual machine. By having a swap location backing the memory for the VMX process, vSphere can reduce the physical memory consumed by the VMX process by allowing parts of the memory to be swapped out when the host is under memory pressure.

This feature is enabled by default in vSphere 5.0 and vSphere will automatically create the VMX swap file for each virtual machine when the virtual machine is powered on. By default, the VMX swap file is created in the virtual machine’s working directory but a different location can be set with sched.swap.vmxSwapDir. Although not recommended, the VMX swap feature can be disabled on a virtual machine by setting sched.swap.vmxSwapEnabled to FALSE.

This feature can reduce the VMX memory reservation from about 50MB or more per virtual machine to about 10MB per virtual machine with no noticeable performance impact. Although a savings of 40MB might not seem like a huge amount, with today’s hosts being able to support higher levels of consolidation and running more and more virtual machines per host, every little bit helps and it can add up to some big savings. This is especially true for environments where a very large number of virtual machines run on a single host, for instance VMware View / VDI environments where potentially hundreds of virtual machines, each with their own VMX process, are running on a single host.

For more details on the VMX Swap feature and other performance features and best practices for vSphere 5.0 that you might have missed see the vSphere 5.0 Performance Best Practices white paper http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf

 

Viewing ESXi Logs From the DCUI

Posted 28 June 2012 by Kyle Gleed, Sr. Technical Marketing Architect, VMware

I recently had a customer ask if I had any tips for viewing log files from the DCUI.  He mentioned that he was recently trying to track down an issue but struggled with using the DCUI "View System Logs" feature.  He commented that it was painful having to scroll through the files using the arrow keys and that he wasn't able to make any sense out of the help screen.  He was a long time Windows admin who had never worked with UNIX/Linux and was not was not familiar with the "vi" like interface used by the "less" editor, which is used to view log files from the DCUI.  As I'm sure he's not the only one who's been frustrated by this I thought it would be nice to give a quick tutorial on how to navigate log files using the DCUI.

Continue reading

Storage IO Control and SRM Planned Migration

We have a known behaviour documented in this KB article in this KB article and also in this KB article dealing with unmounting a datastore that is using Storage IO Control.

At its core, SIOC is doing rapid checks against the datastore to test current storage conditions, including populating an always open file (iormstats.sf) with latency data used to help determine actions.  These operations in essence block the unmount/detach process, which I imagine is a safety measure to protect hostd.

In normal operations this is not a big deal – an administrator can disable SIOC against a particular datastore and carry on doing their unmounts or whatnot.  Even in a disaster failover in SRM, say for example you're doing a forced failover during a disaster, things should carry on quite happily.  If the primary datastore can't get unmounted it won't stop the failover from ocurring, though it may generate some errors.

The problem comes when doing a planned migration, as a few of our customers and the gentleman at blocksandbytes documented. 

Planned migration wants to ensure a clean unmount and detach of any datastore used by SRM and it will fail and halt the recovery plan should any errors be encountered.  Since SIOC blocks an automated unmount/detach we can't do this step, and the planned migration will never complete until SIOC is disabled for the relevant datastores.

This is something that we're looking at doing differently in the future, but for now make sure this is part of your change control process:

When doing a planned migration, make sure you disable Storage IO Control before beginning.

Thanks to our great community (Messieurs McKaige, Lloyd) for bringing this to light!

@vmken

vSphere HA admission control calculation for memory

By Duncan Epping, Principal Architect.

Someone asked me today how the calculation is done for memory when the "Percentage Based" admission control policy was selected. As hopefully all of you know by now, the percentage based admission control policy does not work with fixed slot sizes. (More details to be found in the Availability Guide and the Best Practices white paper.)The percentage based admission control policy is a per virtual machine mechanism, hence the reason I always prefer this admission control policy as it is most flexible.

So how is the memory part calculated? Memory is calculated by taking the total amount of resources in a cluster and from this the virtualization overhead like agents and the VMkernel is subtracted. That will give you the total amount of resources for workloads in the cluster. If you have set your percentage to for admission control to 25 then this is also subtracted from the total as available for virtual machines, as this is reserved for a possible fail-over. What is now left is what can be "claimed" by virtual machines.

With claimed I am referring to the fact that if virtual machines are powered-on the memory resources that are reserved (including the memory overhead) for that virtual machine is taken from the total amount of available resources. To give an example:

Cluster has 100GB of memory resources available. The percentage reserved for fail-over is 10. There is some additional minor memory overhead this is also subtracted. In total 89GB of memory is available for workloads. A virtual machine with 8GB of memory and a 2GB memory reservation is powered on. From the 89GB that 2GB memory reservation is subtracted and the memory overhead of 300MB is subtracted. Now there is 86.7GB for workloads.

SRM will not preserve reservations on failover

Saw a good question a few times in the last few weeks:  Will SRM preserve any reservations, limits, etc. that we've put on VMs when it fails over?

The answer is no, it will not!  Make sure you're failing over into appropriate resource pools if this is something that is important to you.  

The shadow VM will not look identical to the protected VM in this respect, so it's key to have an appropriate set of limits and reservations at the resource pool level into which you are failing over.

And moreover, keep in mind that if you then reprotect and failback… you'll need to set up your per-VM reservations and limits once more because technically it's a new VM we've (re)created on failback.

The moral of the story?  Use resource pools and resource mapping for both failover and failback to make sure your reservations and limits are preserved.

-Ken

***Edit***  My friend and respected colleague at VMware Michael W has pointed out quite rightly that you can manually edit the shadow VM's properties to reflect the required reservations.   This is of course another way to accomplish this if it is important to have per-VM reservations.  I like making sure that the resource pools are mapped correctly to start with though, and only if you need to be very specific, edit the shadow VM.  Keep in mind it's not automatic, you have to do it by hand as it were, and when it creates the shadow on the primary site for a reprotect, you'll need to do it again before failback.

How to use Port-Mirroring feature of VDS for monitoring virtual machine traffic?

I would like to clarify few things in this blog entry about the Port-mirroring feature that is available on vSphere Distributed Switch (VDS). This feature is similar to the port mirroring capability available on the physical switches. Network administrators can use this feature to troubleshoot any network related issues in the virtual infrastructure and monitor virtual machine to virtual machine traffic that is flowing on the same ESXi host. Network administrators use network analyzer tool, which captures traffic, along with the port mirror feature to perform monitoring and troubleshooting activities. In the physical network, depending on where the analyzer or debug tool is placed in the network, network administrators choose different port mirroring options. The following are some of the standard port mirroring options available on physical switches:

-       Switch Port Analyzer (SPAN)

-       Remote Switch Port Analyzer (RSPAN)

-       Encapsulated Remote Switch Port Analyzer (ERSPAN)

SPAN feature is local to the switch and requires the monitored ports and the destination port are on the same switch. With the release of vSphere 5.0, VMware provides support for only SPAN feature on VDS. The following blog entry discusses the feature in little more detail. During the setup of a SPAN session customers have to select a virtual port that needs monitoring and then choose a destination virtual port where all the traffic will be mirrored. Here are some of the common monitoring and troubleshooting use cases based on where the analyzer tool is running.

1)    Mirroring to an analyzer tool running in a virtual machine on the same host.

As shown in the figure below, you can have a virtual machine run analyzer tool. In such scenario you have to configure the pot mirror session with source as virtual port of the monitored virtual machine and destination as the virtual port of the virtual machine running analyzer tool.

Analyzer_vm

2)    Mirroring to an external physical analyzer connected directly to the uplink port of the host.

In this case the analyzer tool is running on an external physical device, which is directly connected to the host through a NIC. As shown in the figure below, the source virtual port of the port mirror session remains same but the destination is changed to the uplink port connected to vmnic1. The mirror packets are sent through the vmnic1 to the analyzer device for monitoring.

Analyzer_uplink

3)    Mirroring to an external physical analyzer connected to a physical switch where the host is also connected.

This setup is possible provided you configure a SPAN session on the VDS and physical switch as well. Let’s dig a little more here. As mentioned earlier, SPAN feature is local to a switch and requires both monitored and destination ports on the same switch. If you look at the diagram below, the analyzer is not directly connected to the VDS. It is connected through a physical switch. So this is not a straightforward use case 2.

Let’s take a look at the mirror packet flow. The port mirror session is configured on the VDS with the virtual port of the monitored virtual machine as the source and uplink connected to vmnic 1 as the destination. All packets flowing through the monitored virtual machine are now copied through the vmnic1 to the physical switch port. On the same physical switch the analyzer is connected to a different port. The analyzer connected to a port on the same switch is not going to see the traffic mirrored by VDS. For this use case to work, it is not enough to configure the port mirror session on VDS. You have to configure SPAN session on the physical switch with the switch port where the host’s vmnic 1 is connected is the monitored port and the destination port is where the analyzer is connected.

Analyzer_switch

VDS currently doesn’t support RSPAN capability, which allows network administrators to monitor the traffic remotely multiple hops away from the source. Customers have to create a dedicated VLAN to carry the RSPAN traffic and the switches supporting RSPAN feature have to encapsulate all the monitored traffic in this special VLAN.

There is also some confusion because of the GUI screen options provided during the port mirroring setup on VDS.

If you take a look at the configuration screen shown below, there is an encapsulation option shown in the red box. This encapsulation option gives the feeling that RSPAN is supported. However, it is not and you shouldn't configure this parameter. 

Portmirror_3

As usual, I would like to hear your comments. Thanks for reading.

Get notification of these blogs postings and more VMware Networking information by following me on Twitter:  @VMWNetworking

 

Multi-disk replication into common directory with vSphere Replication


Posted by
Ken Werneburg
Tech Marketing

When replicating VMs with vSphere Replication you have control over the target location for your VMDKs that are being replicated.  For example, you may have a VM with two disks and choose to place them in different directories and locations than they were in the primary site from which they are being copied.

That's great, but something that never came up with array replication is what happens when we want to put all the disks in the *same* directory at the recovery site?

If we have a machine ("VM1") that has 2 VMDKs each on different datastores, for example, they might both be named the same.  We might have two "VM1.vmdk" files.  In fact, this happens by default – the first disk in a directory is named for the parent VM without consideration for the fact that there may be other VMs elsewhere on different disks.

Disk one: 

Disk1

Looks a lot like disk 2:

Disk2

Obviously attempting to replicate these VMs into a single directory on the same datastore will cause difficulty on the recovery site.  We can't have more than one VMDK of a particular name in a directory.  

vSphere Replication is smart – it will detect that an existing disk is in that directory, but it is not so smart that it understands what it is.  It assumes very smartly that this is a seed disk you have copied in via sneakernet.  If you've got this far with a multi-disk VM, don't choose this option, or you'll use the 'first' disk as a seed and it will do a full synch of the second disk against it, overwriting it!

It-is-not-a-seed

What are the options?  Well first, you can get to the CLI and use vmkfstools to detach, rename, and re-attach the disk then replicate to your heart's content.  

Another idea is to create folders for these disks and populate them into unique folders.  I.e. "VM1" is the main folder and within it there will be the VMX and all other items and sundry for a VM.  In your other directories (such as "OS Disk" and "Data Disk") you can replicate the individual VMDKs.  Each directory will then contain a unique but homonymous named "vm1.vmdk".

Put-em-in-folders

Set your VR target per-disk, and choose the appropriate directory at the recovery site by selecting to "specify datastore folder".  This keeps your data separate, retains easy management and the ability to identify a VMDK by function easily, as well as giving you the freedom to use separate data stores at the source and single data stores at the recovery site.

-Ken

 

Leveraging vApp & VM Custom Properties In vCloud Director

By William Lam, Sr. Technical Marketing Engineer

A neat little feature that I have been exploring with in vCloud Director is Custom Properties for a vApp and Virtual Machine. This was one of the new features in the vCloud Director 1.5 release which allows users to easily pass data into the guest OSes using OVF descriptors. I found the Custom Properties to be a very interesting capability as it allows users to dynamically provision a vApp and present variety of custom data for the guestOS to access such as guestOS customization, asset tag information as well as application startup parameters. This last use case is really cool in my opinion as it allows you to customize how an application should be configured and startup based on some criteria.

Continue reading

VMware Workstation Tech Preview 2 available

By Duncan Epping, Principal Architect.

I just noted on twitter that VMware Workstation Tech Preview 2 was released. I know this blog is not about Workstation but about vSphere, despite that I feel it is important enough to mention. One of the reasons for it is the fact that with VMware Workstation TP2 you can:

  1. Upload virtual machines to vSphere.
  2. Download virtual machines from vSphere by dragging them from the remote host to the My Computer section of the Virtual Machine Library

Is that useful or what? Just imagine you want to have a "local" copy of that server to do some testing. You just drag and drop it from your server on to your desktop and you are done. Another reason to look at TP2 is the fact that the virtualization of "Intel VT/EPT and AMD-V/RVI" has been enhanced. I personally use this functionality at home to run a fully virtualized lab! Because of this feature I can even run 64bit guests on my virtualized ESXi host. Just to give you an idea of the capabilities with VMware Workstation this is what my virtualized lab looks like:

  • vCenter Server
  • 2 x ESXi with 4GB each
  • vCloud Director
  • vShield Manager

I even used this virtual lab to test the DR of the Cloud solution. All it takes is one desktop with SSD and enough memory (24GB at this point). Nice right?! 

Of course there are more great enhancements and I want to point you to the "what's new" document for all the cool details. One of the new features that I personally think is very cool is called WSX and is a pet project of Christian Hammond. Many of you have used the "VMRC" plugin which comes for instance with vCloud Director, and as you know it doesn't support all OS platforms right now… Wouldn't it be great if WSX could evolve in to something that could be more widely used. It could be used for anything in my opinion and I guess right now things are still open, hence the reason I want to ask all of you to try it out! 

If you downloaded the tech preview of VMware Workstation make sure to update it to the latest version. If you have used WSX and want to provide feedback either use the Tech Preview community forums or drop a note on Christian's blog. Even if you have ideas around future usage of WSX don't hesitate to drop a note on Christian's blog, it might be a use case that no one has come up with before.