VAssert is a new API, debuting in Workstation 6.5, that uses the Record and Replay functionality that we’ve been talking about for some time now. As you can tell by its name, VAssert is a relative of your standing programming ASSERT debugging tool, but by delaying assert-checking until later when the exact machine instructions are replayed, it can be very fast. That’s some virtualization Deep Magic.
VMware engineers Weiming Zeng and Min Xu give us this guest post on demonstrating VAssert within Apache, and include the Apache patches they used so you can give this a try at home.
A Virtual Buffer-overflow Checker for Apache
by Weiming Zeng & Min Xu
The Record and Replay feature in workstation 6.5 introduces a new guest programming API – VAssert (Virtual Assertions). It is intended that software developers can use it to move expensive program error checking, such as buffer-overflow, to the deterministic replay phase. But does VAssert live up to its promise? As an experiment, we applied VAssert to Apache httpd and wrote a simple buffer-overflow checker by modifying the memory manager in Apache Portable Runtime (APR). Comparing with the same buffer-overflow checker implemented using traditional assertions, the virtual assertions incur 78.77% less runtime overhead.
2 The idea
Our idea to detect buffer-overflow is simple. When allocating memory, append a byte of magic number (the guard) to the end of the memory block; during execution, we frequently check whether the guard is changed. If so, a buffer-overflow is detected.
One of the benefits of this detector is that it is simple to implement. There is no need to intercept all (or most) memory accesses, as other detectors require. But this detector can cause a huge program slowdown if the guard bytes are checked frequently. The slowdown might alter a program’s behavior so that bugs disappear when the detector is activated. With a “virtual” detector, however, the slowdown happens mostly during replay time. Since the replay is deterministic, the detector can find bugs without altering a program’s behavior.
3.1 Record and Replay
Our Record and Replay is at an instruction-exact level. During
replay, the exact same internal program state is recreated, not just
the program screen output.
This is useful for application debugging and testing, especially for
multi-threaded programs. Often the behavior of these many threads
differ, and will never be the same in different runs
of the program. With Record/Replay, all the things that ever happened
during the recording will be reproduced exactly when replayed. Problems
with irreproducible bugs will no longer trouble the developers. Still
confused by Record and Replay? See this blog post for more information.
While replaying, we provide several methods for developers to probe
the internal state of a program. You can use a debugger or insert
replay-time-only code, such as virtual assertions.
VAssert provides an API like ordinary ASSERT(), which is named
Vassert_Assert(). It is intended to be used as an ordinary ASSERT(),
but it takes effect only in replay mode.
Internally, the code called from VAssert_Assert() is skipped unless
a program is replayed. The place where a programmer calls
VAssert_Assert() is really a place holder for inspecting the internal
state of the program during replay time. During replay, once a VAssert
is evaluated successfully, the virtual machine goes back to the state
before the VAssert and continues normal replaying. Otherwise, if a
VAssert fails, the virtual machine stops replaying and allows users to
interact with the program, e.g., through a debugger.
Here is the the programming guide to VAssert.
3.3 The APR pools
Apache-httpd uses APR for memory management. All memory blocks
allocated by APR are contained in memory pools. There can be multiple
pools at any given time, and these pools are organized as a tree. The
tree structure ensures all the pools are freed when APR terminates, to
avoid memory leaks.
Memory management related functions in APR are
apr_pool_initialize(), apr_pool_terminate(), apr_pool_create(),
apr_pool_destroy(), apr_palloc(), apr_pcalloc() and apr_pool_clear().
Setting the guards. A good place is to set the
buffer-overflow guard is when apr_palloc() is called and the memory
block is going to be returned to the caller.
We modify apr_palloc() to do these things, which are transparent to the caller:
- increase the block size by 1 to make room for the guard
- allocate a block of memory from a pool
- append a guard (a magic number) at the end of the block.
- keep a pointer to the guard in a buffer for checking purpose
- return the block to the caller
The data structure looks like this:
When to check the guards. One time that a guard should be
checked is before the memory block to which it’s appended is freed. So
we can check buffer-overflow in apr_pool_clear() and
apr_pool_destroy(), like this:
- ASSERT(guardOf(mem_block) == 0xFA);
- the rest of work in the original code
However, it can be too late to detect a buffer-overflow only when
memory blocks are freed. What if the block never gets freed? Therefore,
we implemented a function named check_all_pools() to check all guards
in all pools in more function call sites, such as apr_palloc(),
apr_pool_create(), apr_pool_create_ex(), apr_pool_clear(),
apr_pool_destroy() and many more places.
Making it virtual. To make the detector “virtual,” we simple
replaced the ASSERT() call to check_all_pools with a call to
Vassert_Assert(). During the recording of normal Apache execution, the
call to check_all_pools() is skipped and the actual checking of those
guards are deferred until replay-time.
We studied the performance of the buffer-overflow detector using ab (Apache benchmark). On a Fedora 8 as Guest OS inside VMware Workstation, we did the following:
- Build and deploy the Apache-httpd into the guest OS, both the
original version and the version with the buffer-overflow detector.
- Use Apache Benchmark (the ab command) to drive httpd:
ab -n10000 http://hostname/sample_file
This command should generate 10000 requests to the “sample_file” on the server and output performance statistics.
- Run the benchmark in both non-recording mode and recording mode.
- The data reported by ab of ‘Time taken for tests’ are shown below:
Without Detector Normal Detector Virtual Detector without R/R 6.32 (secs) 33.56 (secs) 7.13 (secs) with R/R 19.31 (secs) 59.33 (secs) 21.73 (secs)
The first row is the time taken to run without using VMware Record
and Replay and the second row is using Record and Replay. The
difference in time between the two rows is influenced by overhead,
because we used an unoptimized internal build of VMware Workstation.
The actual overhead of Record and Replay is much lower.
What is interesting is that the virtual buffer-overflow detector
(21.73 secs) beats the normal detector in both non-recording (33.56
secs) and recording modes (59.33 secs).
We didn’t measure the speed of replaying the virtual detector. It is
much slower than recording, yet still deterministic. We imagine the
replay runs can be down overnight, since all replaying can be done
unattended. The only cost is machine time.
We found VAssert is effective in removing program-checking overhead
at runtime. It is relatively easy to add the VAssert detector to
existing programs, which mostly use assertions already. Since VAssert
is not performance sensitive, programmers may concentrate on
correctness, not performance, when implementing their program-checkers.