Reconsidering How and What We Trace

If you look at the cyclical nature of how computers work, the saying, “what once old is new again,” is a situation that happens a lot. Sometimes a problem looks different and upon deeper inspection, turns out to be the same problem we’ve always had. Or sometimes we branch out, and the lessons we’ve learned before just need to be re-learned. In this respect, let’s take a look at containers, and their workloads, today.

Containers are a fascinating technical solution, and make a number of things easier, particularly from a DevOps perspective. One of the things containers provide is more isolation, though not explicit isolation, like a VM, from the rest of the running system. This makes looking into how a container works a bit problematic. There are some ways around it—you can explicitly build, debugging into the container, or attempt other ways to debug from within the container to find the problems—but it doesn’t always give you the full picture. Remember a container, while somewhat isolated, is not the only thing running on a system, and you may need to broaden your view to determine what’s going on.

Getting Assistance from the Linux Kernel

From a systems perspective, the way you broaden your view, is to actually dig deeper. In this case, it’s to go down to the underlying kernel and ask it for help. The modern containerized world predominantly runs on, and relies on, the Linux Kernel. The Linux Kernel has a long history of built-in tracing functionality to help determine what’s going on from within the kernel itself and seeing how user space is running over the top of the kernel system. We can use this functionality to our advantage in gaining insights, as the Linux Kernel sits in a unique place in the stack and by definition, it has access to everything. The impediment to using these tools (ftrace, perf, etc) is that they have been built for working with the Linux Kernel itself, which makes their approachability by, say, a Python Programmer, effectively impossible. The amount of knowledge you would need about the inner workings of the kernel just to get useful information out of it is significant, and makes being able to quickly ask questions about the workings of the system in a general way nearly impossible.

Lowering the Barrier to Entry for More Users and Developers

So if the Linux Kernel’s tracing subsystem is difficult to approach, why am I about to advocate for using it to help us trace containers? Well, the answer is we are working on making it more approachable. There’s a series of efforts from libtracefs, libtraceevent, trace-cmd, and all the way up to kernelshark to start making these low level pieces more approachable by developers, and users, outside the Linux Kernel. We are currently working on a VMware-originated open source project called trace-cruncher, which is a binding library intended to bring the means of interfacing to these intermediate libraries, all the way down to the Linux Kernel, thereby making it approachable in normal higher-level languages on their own terms. Python is already supported here which means that we can quickly start doing some far more interesting tracing.

For example, let’s take an Nginx server serving a static website, and for some reason, it’s taking a long time to load for which there’s no logical reason. Yes, this is a somewhat silly example, but it’s simple and easy to relate to so bear with me! In trying to debug this, because it is so simple, we need to look at the broader system as there’s not a lot of places to instrument inside the container and figure out what’s going on. Using trace-cruncher, we can write a quick script to watch the process and see what happens.

Note: The trace-cruncher api is still in flux and the process below is meant as an example, not an absolute cut/pasteable thing. For examples that are being updated along with the code itself, refer to GitHub.


import sys
import tracecruncher.ftracepy as ft
import tracecruncher.ft_utils as tc                                                                                              
 
swake_events = tc.event( 'sched', 'sched_wakeup' )                                     # Trace when a woken up process
tep = tc.local_tep()
 
def callback( event, record ):                                                         # Callback, print info on traced events
    print( tep.info( event, record ) )
 
if __name__ == "__main__":
    pid = sys.argv[1]
    
    inst = ft.create_instance( tracing_on=False )                                      # Create new ftrace instance
 
    swake_events.enable( instance=inst )                                               # Enable the tracing
 
    swake_events.set_filter( filter='comm ~ "' + str(pid) + '"',                       # Filter based on the pid we are tracing
                                            instance=inst )
 
    ft.hook2pid( instance=inst, pid=int(pid), fork=True )                              # Hook the tracing to the pid as well
    
    print( 'Ready to trace PID from CLI: ‘ + pid +’ ...' )
  ft.iterate_trace( instance=inst, callback='callback' )                               # Attach the trace to the callback

There’s really nothing substantial in this simplified example code. A couple of imports setting up what we want to trace, processes that have been scheduled into the process scheduler from wakeup, a defined callback which prints out what’s happened, and the main function that takes the command line argument of the pid of the nginx. It also sets up a specific filter on all associated pids.

If you run this script and reload the website, you’ll get output that looks something like this:


$ sudo python3 test1.py 1638209
Ready to trace PID from CLI: 1638209 ...
 sched_wakeup: comm=nginx pid=1638209 prio=120 target_cpu=000
 sched_wakeup: comm=fuse-overlayfs pid=1638164 prio=120 target_cpu=006
 sched_wakeup: comm=python3 pid=3215454 prio=120 target_cpu=001
 sched_wakeup: comm=nginx pid=1638209 prio=120 target_cpu=000
 sched_wakeup: comm=exe pid=1638170 prio=120 target_cpu=005
 sched_wakeup: comm=python3 pid=3215454 prio=120 target_cpu=001
 sched_wakeup: comm=conmon pid=1638189 prio=120 target_cpu=006
 sched_wakeup: comm=fuse-overlayfs pid=1638164 prio=120 target_cpu=007

So we see nginx wakes up, and then while it’s processing fuse-overlayfs, and in turn python3 is called. This particular container environment, running podman, makes use of fuse-overlayfs and while fuse-overlayfs can provide a lot of advantages, fuse itself can sometimes be a performance bottleneck (in this made-up example, this is where I’m intending to point the finger at the performance bottleneck). This tracing, however, gives us insight we wouldn’t have seen from inside the container itself or that the container itself would know about: that the underlying filesystem may be the performance bottleneck.

Join the Community

This is one vision of where tracing is heading, but these libraries and language bindings can lead in many other directions as well, resulting in a plethora of new tools yet unknown. If you’re interested in playing around with this script, please note it is still in fairly early development. The upside is the more feedback we get early on, the better our direction will be going forward. I invite you to visit trace-cruncher github repository and if you are so inclined, to take a look at the pre-built packages for Fedora and CentOS COPR, where we currently have a fairly up-to-date set of rpm packages available. (Sorry .deb users, I don’t have that setup yet.)

Stay tuned to the Open Source Blog and follow us on Twitter for more deep dives into the world of open source contributing.

Getting Assistance from the Linux Kernel

Lowering the Barrier to Entry for More Users and Developers

Join the Community

Related Articles

Security audit of go-tuf The Update Framework

Open Source is the Future of LLMs

Safety for All with Repository Service for TUF