Fault Tolerance Performance in vSphere 6

VMware has published a technical white paper about vSphere 6 Fault Tolerance architecture and performance. The paper describes which types of applications work best in virtual machines with vSphere FT enabled.

VMware vSphere Fault Tolerance (FT) provides continuous availability to virtual machines that require a high amount of uptime. If the virtual machine fails, another virtual machine is ready to take over the job.  vSphere achieves FT by maintaining primary and secondary virtual machines using a new technology named Fast Checkpointing. This technology is similar to Storage vMotion, which copies the virtual machine state (storage, memory, and networking) to the secondary ESXi host. Fast Checkpointing keeps the primary and secondary virtual machines in sync.

vSphere FT works with (and requires) vSphere HA—when an administrator enables FT, vSphere HA selects the secondary VM (admins can vMotion the VM to another server if needed). vSphere HA also creates a new secondary if the primary fails—the original secondary becomes the new primary, and vSphere HA selects an available virtual machine to use as the new secondary.

vSphere 6 FT supports applications with up to 4 vCPUs and 64GB memory on the ESXi host. The performance study shows results for various workloads run on virtual machines with 1, 2, and 4 vCPUs.

The workloads—which tax the virtual machine’s CPU, disk, and network—include:

  • Kernel compile – loads the CPU at 100%
  • Netperf-  measures network throughput and latency
  • Iometer- characterizes the storage I/O of a Microsoft Windows virtual machine
  • Swingbench- drives an OLTP load on a virtual machine running Oracle 11g
  • DVD Store –  drives an OLTP load on a virtual machine running Microsoft SQL Server 2012
  • A brokerage workload – simulates an OLTP load of a brokerage firm
  • vCenterServer workload – simulates actions performed in vCenter Server

Testing shows that vSphere FT can successfully protect a number of workloads like CPU-bound workloads, I/O-bound workloads, servers, and complex database workloads; however, admins should not use vSphere FT to protect highly latency-sensitive applications like voice-over-IP (VOIP) or high-frequency trading (HFT).

For the results of these tests, read the paper. Also useful is the VMware Fault Tolerance FAQ.


8 comments have been added so far

  1. Our primary application is a small SQL installation with 5 Users. It is a 911 Dispatch environment. With FT on, Dispatch staff can notice 1/2 to 3 second delays in screen input. Without FT everything runs fine without delays. We purchased into VMware particularly with the intent to use FT. Will there Ever Be a better environment with FT (without latency) than there is now? Can we Ever expect FT to work in a latency sensitive environment?

  2. Same Problem like Ross, here with vSphere 6.0U2 and vSAN. We have vmkernel for FT and vMotion with 10Gb Links, the VM is 1Gb connected.

    Network throughput without FT: ~80 MByte/s
    Network throughput with FT: ~2 MByte/s

    Latency without FT: <1ms
    Latency with FT: between 10ms and 200ms

    The VM is Server2012R2 without any workload! It's unusable….

  3. Hi Ross and Don, we will be sharing some “tech preview” performance data in a breakout session at VMworld in Las Vegas. Will you be there? We’ve made some significant improvements to FT for workloads that are more susceptible to latency increases, so while there is still some performance overhead, you will see better performance vs. what you did in vSphere 6.

    Are you using 10Gb NICs for the FT logging network, separate from the management or vMotion networks? Which Intel / AMD processor generations? Happy to do a call with you and our engineering team to discuss these specific cases. Please email me at if you’re interested in setting up a discussion.

  4. Guys, just weighing in here too. We turned on FT for a single PostgreSQL server POC. We’re using Cisco B200 M4’s, with VIC1340’s, 10GB network dedicated to vMotion and FT, separated by Load-Balancing – so FT goes down one nic and vMo goes down the other, as long as both are online(and they are). My SQL DBA and I wanted to see what the effects of FT were- we were unpleasantly surprised. Here are some results from performance testing- notice ‘number of transactions actually processed’ in both cases. Night and Day difference. We also had high hopes of using FT instead of the PostgreSQL replication engine.
    Test results from PGBench follow:

    Mixed Load- Before FT

    [root@pgbarman01 ]# pgbench -T 600 -j 2 -c 4 -U postgres -h pgpoc01 CustomerOffer
    starting vacuum…end.
    transaction type: TPC-B (sort of)
    scaling factor: 1
    query mode: simple
    number of clients: 4
    number of threads: 2
    duration: 600 s
    number of transactions actually processed: 493832
    tps = 823.044195 (including connections establishing)
    tps = 823.061408 (excluding connections establishing)

    [root@pgbarman01 ]# pgbench -T 600 -j 2 -c 4 -U postgres -h pgpoc01 CustomerOffer
    starting vacuum…end.
    transaction type: TPC-B (sort of)
    scaling factor: 1
    query mode: simple
    number of clients: 4
    number of threads: 2
    duration: 600 s
    number of transactions actually processed: 8859
    tps = 14.746183 (including connections establishing)
    tps = 14.750129 (excluding connections establishing)

    1. Hi Kevin, can we jump on a call to find out what exactly the PostgreSQL is doing while it’s being protected by FT? Everything that is being written to memory will be captured in an FT checkpoint and held back from the end user until the changes are acknowledged by the secondary FT VM, and there might be optimizations we can take considering what the DB is doing. We’ll also post an updated performance paper for vSphere 6.5 in the next few months.

      Please email me at and we’ll set up some time to go through what you’re seeing.

  5. Hey there,

    how can i configure FT to place my secondary VM to a different Datastore? On my 6.0 Cluster the primary and secondary VM are still on the same Datastore 🙁

  6. We have a cluster of three hosts that are part of a HP C7000 blade center. Everything is connected via 10Gb backbone. We have three instances of Solarwinds NMS running on three Windows 2012r2 servers and they all share the same SQL server. All four servers are protected by FT, however when FT is enabled their response time is very long. Without FT, all three perform as they should. FT logging is right around 44,000KBps. Turning FT off on Solarwinds servers fixes the issue. We did not turn FT off on the database server. Any ideas? Processor and memory utilization is very low.

Leave a Reply

Your email address will not be published.