Welcome to ftrace & the Start of Your Journey to Understanding the Linux Kernel!

After giving another talk on ftrace, the official tracer of the Linux kernel, I thought it might be useful to offer a quick primer for those that are still unfamiliar with it. Ftrace was added to the Linux kernel back in 2008, but a lot of people still don’t quite get what it is, or what it can empower them to do.

To put it very simply: ftrace is a Linux kernel feature that lets you trace Linux kernel function calls. Essentially, it lets you look into the Linux kernel and see what it’s doing. Why would you want to do that? Well, most of the time you wouldn’t. But when you do, it is an essential ability to have!

While a task is in user space, it has no visibility into the happenings of the kernel. The kernel runs in a privileged mode that won’t let you see what’s going on inside. And we want it to be that way – Linux would open itself to major security risks if we made the kernel’s workings visible to everyone all the time.

But when you run into problems, you really do need a window into the kernel and that’s what tracing gives you. Say you write a device driver, debug it, and confirm that everything’s fine. Then you run it inside the kernel and find that it’s not working as you expected, or something else stalls or breaks when you run it. That’s when you need to see what is happening at the kernel level.

Ftrace actually stands for “function trace” and its ability to trace functions is what first made the tool popular. Essentially, ftrace built around smart lockless ring buffer implementation, and that buffer stores all ftrace info. It allows you to see all, or a selection of, functions within the kernel and watch the flow of their execution. You can see this live, although that’s usually impossible to follow, or you may record them into a file, and examine the flow of execution at a later time.

This is incredibly useful because it can tell you why your code isn’t working. Maybe the code stalled because it was blocking on locks for a reason you were not previously aware of. Perhaps it triggered a soft interrupt that you were not expecting. A properly composed ftrace query will let you figure out exactly what’s going on, and in that sense, it’s a valuable debugging tool.

Ftrace has two meanings: one is specific for the infrastructure of the function hooks, and the other is the tracing framework based on the tracefs file system interface. Within the framework, there are some key files, including current tracer (which sets and displays the current tracer that is configured), available tracers (which holds different types of tracers compiled into the kernel), and trace (which holds the output of the trace in a human readable format).

Other tools built on the function hooks include the function graph tracer (which traces not only the function entry, but also the return of the function allowing you to create a call graph of the function flow), the stack tracer (where it is possible to see which function is taking up the most stack), kprobes (dynamic events) and even live kernel patching!

The rest of the ftrace tool framework is much more involved. There are ways to trace latency. The ftrace infrastructure created the static trace events which even perf utilizes, in which there are over a 1,000 in the current Linux kernel and more being added every day. The static trace events cover various events within the kernel that the maintainers feel are important. Examples include when tasks are scheduled, interrupts are triggered, networking packets occur, and much more. The trace events contain all the details that the maintainers can use to debug a system that is in production, by recording the events as issues occur and analyzing the data offline.

The trace events have an entire infrastructure themselves. This includes simple histograms, triggering stack traces, starting and stopping tracing, and enabling or disabling other trace events. There are even synthetic events that can be created by associating two events with a common field (like a wake up event and when the task woken up gets scheduled, covered by the scheduler event). Using the synthetic events you can trace and record custom latencies.

Finally, ftrace is a great tool simply for learning. The Linux kernel is now so vast that none of us can possibly know every corner of it (not even Linus Torvalds himself). If I have to learn a new subsystem, I enable ftrace, perform some work, and then analyze the tracing data to understand how that subsystem works. I wish I had ftrace when I first started kernel development.

If you want to explore ftrace for the first time, the official ftrace documentation is a good place to start. Ftrace is both simple and flexible to use. You can implement it with just echo and cat commands. Basically, ftrace has always been designed to work with busybox or toybox, without any addon utilities.

Alex Dzyoba wrote a useful introduction to ftrace a few years back, and more recently Andreas Christoforou wrote the valuable “Kernel Tracing with Ftrace.” I’d also recommend Julia Evans’ blog post “ftrace: trace your kernel functions!” She discusses the difficulty of running ftrace like this:

cd /sys/kernel/debug/tracing

echo function > current_tracer

echo do page fault > set_ftrace_filter

cat trace

And then explores a more user friendly interface called trace-cmd. It’s a phenomenal place to start your ftrace journey and is accompanied well by Brendan Gregg’s “ftrace: The Hidden Light Switch.”

I’ve also written several articles of my own on debugging the kernel using Ftrace (part 1 and part 2), using the ftrace function tracer, and trace-cmd.

If you are an audio-visual learner (careful, I can speak quite fast), you can check out my Kernel Recipes talk on Understanding the Linux Kernel via Ftrace, or my ftrace tutorial at 2019 titled “See what your computer is doing with Ftrace utilities.”

Lastly, a talk I gave in Sofia last year offers a perspective on using ftrace to learn about the Linux kernel.

Stay tuned to the Open Source Blog and follow us on Twitter (@vmwopensource) for all the latest open source news and updates from the VMware Open Source Project Office.


3 comments have been added so far

  1. My understanding of vmware’s vprobes, is that they are a bit more complex in setting up. They do have static (like trace events) and dynamic (like kprobes), but for the static ones, I don’t believe there’s any way to discover what’s there without knowing that they are there. Ftrace shows all the available trace events and kprobe events in the tracefs directory, and is much easier to add new ones and even enable them. All you need is a busybox interface (basically cat and echo) to control it.

    Another difference is that ftrace is more of an assortment of utilities and infrastructure. The “trace point” is a hook that anything can attach to (more than one callback even). The “trace event” is more of a “fixed” set of data to retrieve from the trace point. Same for the kprobe events. It uses a kprobe which is just a dynamic hook into the code to pass some data or pointer to a routine. The kprobe event specifies what to do with that data. Trace points and kprobes are used by more than just the trace event, as eBPF is built on top of them as well. My understanding of vprobes is that they are more equivalent to just the trace event and kprobe event. But I’m hoping that we can change all this into making vprobes as versatile and easy to use as trace points (and kprobes) and trace/kprobe events.

  2. Thanks for a quick overview. I will try to use ftrace and see how it can help me. I came into this page while searching for a way to “trace” user mode shared libraries. I found a tool called ltrace but it does not work. I am hoping tools become available for user mode objects as well.

Leave a Reply

Your email address will not be published.