vSphere 6.5 PWT launch
Architecture Technical

Timekeeping within ESXi

Overview

Precise timekeeping is a key attribute for many types of applications. At a high level, it allows these applications to accurately construct the precise sequence of events that have occurred or is occurring in real time. For example, in the financial services industry, it’s important to timestamp transactions with sufficient granularity and accuracy from a reference standard clock as to be able to determine the execution requests from different counterparties, the correct execution order, and when the trades are retired for auditing purposes. In the telecom industry, applications supporting voice communication need to timestamp incoming packets from different sources in order to re-assemble (or discard) the correct order to play back to the handset. Similarly, the entertainment industry often deploy multiple audio/video equipment with high sampling frequencies, from which the results are re-assembled prior to editing or playback. In all these applications, the events can happen in any order, and results of these events can arrive for processing in any order, but there must be sufficient timing information to correctly re-create the actual order.

We’re starting a series on timekeeping to provide customers information on the level of accuracy that can be achieved within the vSphere environment today, as well as explore potential investments we’re making to deliver even greater time accuracy for tomorrow. The first of the series below discusses the current state of timekeeping in the vSphere environment. We will use this article to establish terminology and existing industry standard protocols to synchronize time. We will also share results that were achieved using open source and commercial software offerings based on a reference testbed. The goal is to have these results be used as input to customers who are planning their infrastructure to support time sensitive applications.

– ESXi timekeeping team

Timekeeping in Computers

Most modern PC operating systems have what’s known as a system clock that provides timestamps and time based callback services. Typically, the kernel will have a built-in timekeeping sub-system that uses one or more hardware timer devices in the platform to track the passage of time to provide this clock. The techniques used to track time differ between operating systems, but usually they fall into one of two broad categories. Those that count interrupts firing periodically at fixed intervals, called tick-counting systems, and those that rely on hardware counters to measure elapsed time, called tick-less or dynamically-ticking systems.

It has become increasingly common for popular operating systems running on x86 hardware to rely on the CPU’s Timestamp Counters (TSC) for timekeeping. These per-CPU hardware counters are cheap, scalable and have a high-resolution. Although, in the past, TSCs were mired with reliability issues due their vulnerability to CPU clock speed variations, major microprocessor vendors have since introduced improvements that make them highly suited for timekeeping. Besides TSCs, operating systems also rely on devices like the High-Precision Event Timer (HPET) and the Local APIC Timer, to generate time based interrupts.

VMware virtual machines are afforded all of the modern x86 timer hardware in the form of virtual devices. System clocks of common guest operating systems work seamlessly without requiring any modifications. In keeping with the trend, VMware’s virtual hardware also provides reliable virtual TSCs. Using hardware virtualization features available in both Intel and AMD microprocessors, guests get all the benefits of TSC hardware at performance that is nearly indistinguishable from non-virtualized environments. More information is available in VMware’s Timekeeping in Virtual Machines information guide.

Accuracy

Whether or not a system clock can be applied to a particular use case depends on its accuracy. A calendaring application might be able to tolerate a clock that is off by a few seconds, whereas a financial trading system subject to regulations has hard requirements on accuracy, precision and resolution. Generally speaking, the accuracy of a clock is a measure of how close its time and frequency are with respect to a reference clock or time standard. This discussion only considers UTC (Coordinated Universal Time) as the reference, the world’s primary time standard for scientific, engineering, financial and various other activities. It is implemented using atomic clocks by scientific laboratories around the world, and disseminated via radio signals and GPS satellites.

The precision of a clock describes how consistent its time and frequency are relative to a reference time source, when measured repeatedly. The distinction between precision and accuracy is subtle but important. A clock may be inaccurate by being at an offset from a reference clock, but it is precise if it maintains that offset over repeated measurements. The achievable accuracy of a clock depends on its precision.

The resolution of a clock is the smallest possible difference between two consecutive measurements. For example, system clocks based on TSCs can achieve resolution in the order of nanoseconds. A more traditional tick-counting system may be limited in resolution by the system tick period. Resolution sets a lower bound on the achievable accuracy of a system clock.

The accuracy of computer clocks is subject to various factors. Oscillators underlying most hardware timer devices have imperfections resulting in static and dynamic errors. Manufacturing defects and impurities introduce errors in an oscillator’s output frequency, resulting in system clock skew. Environmental factors such as changes in ambient temperature can cause the oscillator frequency to deviate from its expected frequency, causing the system clock to drift. Sometimes, an error made by a system operator in initializing the on-board real-time clock can introduce a divergence in the system’s real time clock. On top of all that, since the kernel’s software timekeeping subsystem is the final aggregator of timing information, small errors in timer arithmetic, or even undue latencies between reading a timer device and updating the kernel timekeeping variables can set a lower-bound on the accuracy of the system clock.

Time Synchronization

Time synchronization is the process of improving the accuracy of a clock by aligning its time and frequency to a reference time standard. The general approach to doing this is to first compute the clock’s offset from its reference, and to subsequently make adjustments to minimize that offset. Adjustments can be made in the form of step corrections, where the clock’s time value is set to its correct value, or in the form of frequency corrections steering the clock to its ideal value.

The ultimate reference for synchronizing to UTC time is information disseminated by time laboratories via radio signals or GPS satellites. But due to reasons of cost and convenience, not all systems have access to such high-precision sources. Instead, a small set of primary hosts with access to necessary equipment synchronize their clocks to precision time sources, and in turn, act as reference time sources for a larger swathe of secondary systems over common network infrastructure. This is generalized to a hierarchy, where computers at each level act as the reference time source for those in the next level, and so on. An important trade-off in this approach is that common network infrastructure is prone to variabilities that affect estimation of clock offsets. The further down a clock is in the distribution hierarchy, the less accurate its estimates and the poorer its synchronization.

The synchronization of computer clocks over a network is referred to as Computer Network Time Synchronization. The two well known time synchronization protocols for IP based networks are Network Time Protocol (NTP) and Precision Time Protocol (PTP). Both protocols have a master-slave or client-server architecture, where the client establishes its time offset through a series of packet exchanges with one or more time servers.

A note-worthy problem in network time synchronization is the estimation of the propagation delay of network packets from the server to the client, which must be accounted for when computing time offsets. Ideally, the delays are constant and symmetric in both directions, which allows the client to measure time offset as half of the round trip time of a request to the server. In practical terms, various factors can affect the propagation delay of network packets on IP networks, causing packet delay variations. Common causes include queuing, buffering and other uncertainties in various network elements such as routers, switches, network interface cards, network device drivers, TCP/IP stack, etc.

NTP

NTP is a low-cost, fault tolerant, network time synchronization protocol used to align computer clocks to UTC time. Developed by David L. Mills at the University of Delaware in the 1980s, it remains one of the most popular and widely deployed industry standards for time synchronization.

NTP can synchronize a clock to a wide range of time sources; from high-precision atomic clocks and GPS receivers to servers geographically distributed over the Internet. The protocol has a client-server model, with the clock that needs to be synchronized acting as the client, requesting one or more servers for timing information. Clients synchronized to NTP can also act as servers providing information downstream, creating a hierarchy of clocks. Each level Iin the hierarchy is called a stratum, numbered 0 through 16. Stratum 0 consists of high-precision timekeeping devices like atomic clocks. Stratum 1 computers synchronize to stratum 0, stratum 2 to stratum 1, and so on. The achievable accuracy degrades with increasing stratum level.

A reference implementation of NTP (the network daemon) has been in active development for more than 20 years, and is the most widely used implementation of NTP. Besides that, there are many other 3rd party implementations such as Chrony, Ntimed, FSMLabs TimeKeeper, etc.

ESXi has built-in NTP capability, including a port of the reference implementation and the necessary kernel APIs to support it. We highly recommend synchronizing your ESXi hosts using NTP. See KB 2012069 for more information.

NTP in a Virtual Machine

NTP operates seamlessly inside VMware virtual machines, as long as the guest operating system has support for it and there is network connectivity to time servers. When compared to a non-virtualized system, a couple of differences are worth highlighting. First, NTP in a VM is operating on virtual timer hardware devices. Under most circumstances this should have no impact on synchronization, except when life-cycle events disrupt guest execution. Second, virtualization adds additional network elements (virtual network interface, virtual switch, etc) between the clock and the time servers. These additional software layers can potentially add variabilities that may affect synchronization.

VM life-cycle operations that disrupt guest execution present unique challenges to timekeeping and time synchronization. For instance, suspending a VM to disk completely stops all guest execution including all of its timekeeping and time synchronization. Meanwhile, true physical time continues to make progress. When the VM is resumed, its current notion of time lags behind. NTP was not designed for virtualized environments, and it has no conception of a clock that is deliberately stopped and resumed. Events like this are perceived by NTP as anomalies. Virtual machines running VMware tools remediate problems like this by performing a step correction of the system clock as soon as the guest execution is resumed.

So, what is the achievable accuracy using NTP in a VM? Unfortunately, there isn’t a single, simple answer to this question. Achievable accuracy using any time synchronization solution depends on many different factors. It depends on the guest operating system and its timekeeping abilities, the quality of the host hardware, the network load and infrastructure, the load on the client host, environmental factors, and so on. Solutions like NTP use sophisticated statistical methods to eliminate a lot of the variabilities to compute time offsets, but it is still probabilistic in nature.

To establish a reasonable baseline of achievable accuracy of NTP in a VM, we setup an idealized testbed with minimal variabilities, as described below.

  • ESXi Host: Dell PowerEdge R630 with Intel Xeon CPU E5-2630 v4 (2.20GHz) and 128GB of memory, running ESXi 6.5 update 1. The machine was comprised of 2 sockets, 10 physical cores each with hyper-threading enabled, yielding 40 logical cores. An on-board network interface (Intel X710 10GbE SFP+) served as a single uplink for the vSwitch serving both the Management Network as well as the VM Network port groups.
  • NTP Server: Meinberg LANTIME M1000 linked to a GPS receiver functioning as a stratum 1 NTP server.
  • Virtual Machines under Test: All VMs under test were configured at virtual hardware version 13. They were provisioned with 2 Virtual CPUs, 4GB of memory and 1 VMXNET3 Virtual NIC each.
  • Guest OS: RedHat Enterprise Linux 7.3 with Linux kernel version 3.10.0-514.el7.x86_64.
  • NTP network: In this setup, the virtual machines under test act as NTP clients synchronizing their system clocks to the NTP server (M1000). Both the NTP server as well as the ESXi host share the same subnet over a physical switch. NTP network packets from the VMs traverse the VMXNET3 layer, followed by the host virtual switch which forwards the packet to the physical NIC from where it is makes its way through a physical switch and over to the network interface of the NTP server. Packets on the return traverse the same path in reverse. In NTP terminology, the NTP server is at stratum 1 and client VM is at stratum 2.
  • Load: The ESXi host was under-subscribed both in terms of CPU and memory usage. Other than the NTP synchronization software running in the Guest, Virtual Machines had no active workloads, running idle for the duration of the test.

All tests were conducted over a 24 hour period. Even though it is good practice for NTP client software to be configured with two or more servers for redundancy, the tests were configured to use the only NTP Server in the testbed, since this was a controlled and monitored environment.

NTP Daemon – Reference Implementation

The NTP reference implementation is a freely available software suite that’s been in active development for more than 20 years. In this test, we ran NTP version 4.2.6p5, available in RHEL 7.3, for 24 hourse under idle guest conditions. The daemon was configured to record loop filter statistics including an estimated offset of the local clock from the remote server, each time the local system clock is updated.

The collected statistics indicated that the guest started off at a time offset of approximately -50 ms, which was gradually corrected by NTP to within a millisecond over a period of approximately 3 hours. Post the 6 hour mark, offsets stabilized to within +/-400 microseconds for the remainder of the test. Plotting offsets over the entire duration of the test:

Plotting offsets starting at the 6 hour mark:


Chrony

Chrony is an implementation of NTP which can be used to synchronize a system clock to NTP servers. To quote the Chrony project documentation:

It is designed to perform well in a wide range of conditions, including intermittent network connections, heavily congested networks, changing temperatures (ordinary computer clocks are sensitive to temperature), and systems that do not run continuously, or run on a virtual machine.

In the RHEL 7.3 image used in these experiments, Chrony software suite was available and enabled by default. Chrony consists of a daemon called chronyd as well as a command line client program called chronyc. For the purpose of this test, a basic configuration with a single server entry pointing to the testbed NTP server was used. Additional entries specified the collection of statistics.

The initial guest system time started off at an offset of about -300 milliseconds from the reference server. Within a few minutes, Chrony synchronized the system time to within 1 millisecond, and after an hour stabilized to within +120 to -300 microseconds for the remainder of the test. Plotting offsets over the entire test duration:

Plotting offsets starting at the 1 hour mark:

FSMLabs TimeKeeper Client

TimeKeeper is an end-to-end solution for clock synchronization, providing accurate time to applications. Designed for the extreme requirements of high speed financial trading, TimeKeeper technology has applications in any field where programs running on virtual or physical devices need access to a reliable reference time. The TimeKeeper client software can synchronize clocks on Linux, Windows, and Solaris application servers to one or more reference sources connected over the network.

For this evaluation, we used version 8.0.0 of FSMLabs TimeKeeper Client, configured to the single testbed NTP server. Within a few minutes of starting the client, an initial system time offset of -250 milliseconds was corrected to sub-millisecond values. After that TimeKeeper stabilized guest system time to within +/-100 microseconds for the remainder of the test. Plotting offset information reported by TimeKeeper over the entire duration of the test:

Plotting offsets starting at the 1 hour mark:

Conclusion

The experimental results described above demonstrate achievable accuracy using both commercial and open source time synchronization services based on a specific test environment. When designing an infrastructure to support time sensitive applications, it’s important that you also take into consideration other factors such as cpu load, networking congestion and the quality of reference time sources. Depending on the infrastructure and its various loads, you may observe worse or even better results.

In future articles, we plan to explore the behavior of the system under overcommitment scenarios, impact of vm lifecycle events such vMotion, resume from suspend, snapshotting, etc., as well as discuss potential investments to improve time accuracy in vSphere environments.

References