Skip to main content

4.8 The God's-Eye View of Process Tracing — Exploring execve with perf and eBPF Tools

In the previous section, we demonstrated that dynamic kprobes are virtually omnipotent when it comes to tracing kernel module functions. But that was for "our own code."

Now, we turn our attention to a more fundamental and universal question: Who is doing what in the system?

Specifically: How are new processes born?

If you felt a sense of mastery during the hands-on exercises at the end of the last section, this section will present a wall: traditional kprobe methods fail on certain critical paths. This forces us to seek a more powerful weapon — eBPF.

execve: The Source of Processes

In the Linux (and all UNIX) world, the launch of user-space programs — that is, individual processes — relies on a so-called exec function library.

You might have seen this bunch of names: execl(), execlp(), execv(), execvp(), execle(), execvpe(), and execve().

Among this entire family of APIs, only one is the true "boss."

The first six (execl through execvpe) are essentially just glibc wrapper functions. Their job is to organize arguments, convert various calling conventions into a unified standard, and ultimately call execve().

Only execve() is the actual system call.

A quick fact: execvpe() is actually a GNU extension (found almost exclusively on Linux).

What does this mean? It means that the execution of almost all processes (and applications) ultimately boils down to the execve() code path in the kernel!

Once inside the kernel, the execve() system call transforms into the kernel function sys_execve() (which is actually defined indirectly via the SYSCALL_DEFINE3() macro), and this function calls the real workhorse: do_execve().

Kernel Mapping Pattern for System Calls

This is a typical pattern in the kernel, though not an absolute rule:

A system call foo() issued from user space usually becomes sys_foo() in the kernel. If the sys_foo() code is short, it handles the work itself; If the logic is more complex, it calls a do_foo() function to do the actual work.

Take execve(2) as an example. Its path is: fs/exec.c:sys_execve()fs/exec.c:do_execve().

But that's only half the story.

Look at the figure below. It shows how a user-space call falls into the kernel, and you'll notice that things aren't quite as simple as they appear (for example, open(2)).

(Figure 4.13 – How user-space system calls map to the kernel)

The Red Line: How Does User Space Cross into Kernel Space?

As a useful piece of background: how exactly does an unprivileged user-space task (process or thread) cross that red boundary line (the vertical line in Figure 4.13), jumping from user mode into privileged kernel mode?

Simply put, each processor supports one or more machine instructions to do this, commonly referred to as call gates or traps. We say that a process "traps" from user space into kernel space.

  • x86: Traditionally uses the software interrupt int 0x80; modern versions use the syscall machine instruction.
  • ARM-32: Uses the SWI (software interrupt) instruction.
  • AArch64 (ARM64): Uses the SVC (supervisor call) instruction.

If you're interested in the details, check out the man page for syscall(2).

Alright, back to the main topic. There's also a twin brother called execveat(). It's almost identical to execve(), with the only difference being that its first argument is a directory file descriptor, and the program (the second argument) is executed relative to that directory.

Hitting a Wall: The Limitations of Traditional kprobes

Since we know that all processes are executed via execve(), intuition tells us that to monitor "who executed what," we should insert a probe on this function.

For example, injecting a kprobe into sys_execve() or do_execve().

Sounds perfect, right?

But there's a "but."

On modern kernels, this trick doesn't work. If you try the static kprobe approach (writing a kernel module to register_kprobe()), the operation will fail outright.

Don't take my word for it — try it yourself. Remember, always let experiments do the talking.

In fact, on my x86_64 Ubuntu 20.04 LTS virtual machine, even the wrapper tool designed specifically for this — execsnoop-perf (although it internally uses ftrace's kprobe_events interface) — also fails:

$ sudo execsnoop-perf
Tracing exec()s. Ctrl-C to end.
ERROR: adding a kprobe for execve. Exiting.

That's quite awkward. We need a sharper knife.

The Ultimate Weapon: eBPF Enters the Scene

The problem that left perf-tools in the dust was solved in one stroke by the more modern eBPF tools.

As long as you install and use (as root) execsnoop-bpfcc(8), it works perfectly!

Next, let's take a quick peek at how to trace process execution through an eBPF frontend.

What is eBPF?

eBPF stands for extended BPF (extended Berkeley Packet Filter). Old-school BPF was primarily used for kernel network packet filtering and tracing. eBPF is a relatively new kernel innovation (only supported since Linux kernel 4.1, in June 2015).

It massively expands the BPF concept, allowing you to trace not just the network stack, but almost anything — whether in kernel space or user-space applications.

In fact, eBPF and its frontend toolsets have become the modern standard practice for tracing and performance analysis on Linux systems.

To use eBPF, your system needs to meet two conditions:

  1. Linux kernel version 4.1 or newer.
  2. eBPF support enabled in the kernel.

Using the low-level eBPF kernel features directly is extremely hardcore (bordering on "hellish difficulty"), so the community has developed several more user-friendly frontend tools:

  • BCC (BPF Compiler Collection)
  • bpftrace
  • libbpf + BPF CO-RE (Compile Once – Run Everywhere)

The easiest way to get started is to install the BCC binary packages.

⚠️ Installation Tip You can install the BCC toolkit following the official guide. But on some older distributions (like Ubuntu 18.04), directly installing bpfcc-tools might only work for pre-built distribution kernels. This is because the installation process depends on the linux-headers-$(uname -r) package. This header package only exists for distribution kernels, and a matching one might not be found for our custom 5.10 kernel. However, on Ubuntu 20.04 LTS, it usually works fine even when running a custom kernel.

After installing bpfcc-tools, you can get a feel for its massive tool library with the following command:

dpkg -L bpfcc-tools |grep "^/usr/sbin.*bpfcc$"

On my x86_64 Ubuntu 20.04 LTS guest machine (running our custom 5.10.60-prod01 kernel), this command shows a full 112 *-bpfcc tools installed (they are actually Python scripts).

Hands-on: execsnoop-bpfcc

In the previous section, we failed to trace execve() using perf-tools. Now that the eBPF BCC frontend is ready, we try again:

$ uname -r
5.10.60-prod01
$ sudo execsnoop-bpfcc 2>/dev/null
[...]
PCOMM PID PPID RET ARGS
id 7147 7053 0 /usr/bin/id -u
id 7148 7053 0 /usr/bin/id -u
git 7149 7053 0 /usr/bin/git config --global credential.helper cache --timeout 36000
cut 7151 7053 0 /usr/bin/cut -d= -f2
grep 7150 7053 0 /usr/bin/grep --color=auto ^PRETTY_NAME /etc/os-release
cat 7152 7053 0 /usr/bin/cat /proc/version

ip 7157 7053 0 /usr/bin/ip a
sudo 7159 7053 0 /usr/bin/sudo route -n
route 7160 7159 0 /usr/bin/sudo route -n
[...]

Look at that — it works like a breeze.

Whenever a process executes in the system, execsnoop-bpfcc prints a line of details telling you who just did what. Notice that it even displays the full arguments of the executed command!

We recommend running the -h parameter to see the help page, or checking the man page — it contains many practical one-liner examples.

Just like perf-tools, all *-bpfcc scripts must be run as root. Also, the tool can be a bit noisy right after startup (spitting out a bunch of noise), so I redirected it to /dev/null above to keep things much cleaner.

Recap: opensnoop-bpfcc

Remember that old example from the very beginning of this chapter? Tracing do_sys_open().

We can do it with BCC too, and it's even simpler:

$ sudo opensnoop-bpfcc 2>/dev/null
PID COMM FD ERR PATH
1431 upowerd 9 0 /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/PNP0C0A:00/power_supply/BAT0/voltage_now
1431 upowerd 9 0 /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/PNP0C0A:00/power_supply/BAT0/capacity
1431 upowerd -1 2 /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/PNP0C0A:00/power_supply/BAT0/temp
[...]
431 systemd-udevd 14 0 /sys/fs/cgroup/unified/system.slice/systemd-udevd.service/cgroup.procs
431 systemd-udevd 14 0 /sys/fs/cgroup/unified/system.slice/systemd-udevd.service/cgroup.threads
[...]
^C

If you want to dive deep into the capabilities of eBPF tracing tools, Brendan Gregg's page is a must-visit: https://www.brendangregg.com/ebpf.html


Chapter Echoes

This chapter was long, but we walked the complete path from "backyard iron smelting" to a "modern factory."

The core takeaway of this chapter is: Observability is the cornerstone of system debugging. Initially, we had to hand-write kernel modules to insert probes — that was the "static" era, where changing a single line of code required recompilation, posing enormous risks in production environments.

Later, we mastered ftrace and kprobe_events, which marked the beginning of the "dynamic" era — no need to write a single line of C code, we could dynamically insert probes solely through the debugfs interface. This made production debugging feasible.

Finally, when we hit a roadblock on a special system call like execve(), the emergence of eBPF pushed everything to new heights. It not only solved compatibility issues but also provided a sandboxed, safe, high-performance kernel execution environment.

Remember that question from the beginning — how to peek into the internal workings of a system? Now you hold not just a microscope, but an endoscope. Whether it's file opens, process executions, or various kernel events you might encounter in the future, you have a way to capture data without stopping services.

Before moving on, we strongly suggest you pause. Go type a few commands, try writing a simple kprobe script, or get execsnoop running on your board. You can't learn the kernel just by reading — you have to get your hands dirty.

In the next chapter, we'll shift our focus from "code execution" to "memory management." That's a more headache-inducing but even more rewarding domain to conquer. Are you ready to deal with OOM (Out Of Memory) and memory leaks?

See you in the next chapter.