Skip to main content

10.2 When the Kernel Gives Up — A Complete Guide to the Panic Mechanism

To conquer this beast, you must first understand it.

We set up the environment in the previous section. Now, let's trigger a crash ourselves.


Starting with panic(): The Kernel's Last Words

The core code for handling Panics resides in kernel/panic.c, with the heart being the panic() function. This function takes a printf-style format string (along with the corresponding arguments), prints out the final struggle, and then brings the system to a complete halt.

// kernel/panic.c
/**
* panic - halt the system
* @fmt: The text string to print
*
* Display a message, then perform cleanups.
* This function never returns.
*/
void panic(const char *fmt, ...)
{ [...]

Obviously, we can't just call this casually. Once invoked, it means the kernel has entered an "unrecoverable, beyond repair" state. The system immediately ceases all meaningful operation.

Let's create some chaos ourselves.


First Detonation: Write a Module to Crash It

For empirical evidence (strictly within our test VM, of course), let's write the simplest possible module that directly calls panic().

The code is extremely straightforward, requiring no complex initialization logic:

// ch10/letspanic/letspanic.c
static int myglobalstate = 0xeee;

static int __init letspanic_init(void)
{
pr_warn("Hello, panic world\n");

panic("whoa, a kernel panic! myglobalstate = 0x%x",
myglobalstate);
return 0; /* success */
}
module_init(letspanic_init);

Since we're calling panic() specifically to crash the system, writing a cleanup callback is pointless—the system will never reach that point.

Compile and insert the module.

I'm doing this on my trusted x86_64 Ubuntu 20.04 LTS VM (running a custom 5.10.60-prod01 kernel), connected via SSH:

$ sudo insmod ./letspanic.ko
[... <panicked, and hung> ... ]

The moment it goes in—the world goes silent.

The SSH terminal produces no output, and the VM's graphical interface freezes. The system has clearly Panicked, but here's the problem: we're blind.

To debug it, we at least need to see the diagnostic information the kernel spits out. This information looks very similar to the Oops log format we covered in Chapter 7 (if you've forgotten that chapter, now is a great time to go back and review it).

Since neither the local console nor SSH can show us the logs (because the kernel scheduler has stopped, and interrupts might even be disabled), what do we do?

The answer is netconsole! But before that, let's introduce a "backdoor" to trigger a Panic without writing any code.


Blowing Things Up Without Code: Magic SysRq

Sometimes you just want to test the Panic flow but don't want to write a module. The kernel has a built-in backdoor: Magic SysRq.

Combined with the panic_on_oops parameter, three lines of commands are all it takes to bring the kernel to its knees:

echo 1 > /proc/sys/kernel/panic_on_oops
echo 1 > /proc/sys/kernel/sysrq
echo c > /proc/sysrq-trigger

Here's what these three steps actually do:

  1. Set the mindset: Tell the kernel that if it encounters an Oops, immediately escalate it to a Panic—no hesitation.
  2. Open the backdoor: Ensure the Magic SysRq feature is enabled (some distributions disable it by default for security).
  3. Detonate: Trigger a crash via SysRq (c stands for Crash).

What is Magic SysRq?

You can think of it as a keyboard shortcut reserved for God.

It allows system administrators (or developers) to force the kernel down certain code paths that would normally be impossible to reach. This is incredibly useful when the system hangs and you're debugging a hang—it's a direct tunnel to the kernel's backdoor.

This feature must be enabled at compile time (CONFIG_MAGIC_SYSRQ=y). For security, you can fine-tune its permissions by writing to the /proc/sys/kernel/sysrq pseudo-file:

  • Write 0: Completely disable.
  • Write 1: Enable all functions (God mode).
  • Write a bitmask: Enable a specific combination of functions.

The default value is usually determined by the kernel config option CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE, typically 1.

What it can do is pretty "aggressive":

  • c: Force a crash (which is what we used in sysrq-trigger).
  • b: Cold reboot, no questions asked.
  • o: Force a power off.
  • f: Summon the OOM Killer to terminate processes.
  • s: Emergency Sync, forcefully flushing memory data to disk.
  • u: Emergency unmount of all filesystems.

From a debugging perspective, its information-gathering features are the real core:

  • l: Display stacks of active tasks on all CPUs.
  • p: Display CPU registers.
  • q: Display kernel timers.
  • w: Display blocked tasks.
  • z: Dump all ftrace buffers.

How do we use it? Two ways:

  1. Interactive: Mash the keyboard directly. On x86, it's Alt + SysRq + <命令键>. Note that on some keyboards, SysRq and Print Screen are the same key.
  2. Non-interactive: Which is what we just did—write characters to /proc/sysrq-trigger.

If you write ? to it, the kernel will print out a cheat sheet for you:

echo ? > /proc/sysrq-trigger

The official documentation is here, highly recommended reading: Linux Magic System Request Key Hacks.


The Real Lifesaver: netconsole

Back to the "blindness" problem. We need to see the logs, but the system freeze means the local terminal can't write to them (or refresh them).

This is where netconsole comes in.

If you remember Chapter 7 (where we used it while discussing Oops on ARM), netconsole can send all kernel printk messages over the network in real time to another machine.

Let's quickly reproduce the configuration process here.

My current setup is:

  • Sender: The VM running the letspanic module.
  • Receiver: The host machine (or another machine on the LAN).

Get the receiver running first (listening on UDP port 6666):

netcat -d -u -l 6666 | tee -a klog_from_vm.txt

netcat will block and wait for packets, displaying each one as it arrives and saving them to klog_from_vm.txt.

Load the netconsole module on the sender:

The parameter format is slightly counter-intuitive, looking like this:

netconsole=[+][src-port]@[src-ip]/[<dev>],[tgt-port]@<tgt-ip>/[tgt-macaddr]

My command (please replace with your own IP and network interface name):

sudo modprobe netconsole netconsole=@192.168.1.20/enp0s8,@192.168.1.101/

What this means is: send all local printk messages out through the enp0s8 network interface, targeting port 6666 on 192.168.1.101.

Now, everything is in place. Execute insmod ./letspanic.ko again.

This time, although the VM's screen remains dead, your receiver window will疯狂 scroll with the kernel's last words.

This feels incredible—we can finally see exactly how it died.


Line-by-Line Breakdown: The Panic's Final Moments

Now that we have the logs, let's tear them apart. The format is actually very similar to an Oops.

The first thing that catches the eye is the most prominent error header:

Kernel panic - not syncing: whoa, a kernel panic! myglobalstate = 0xeee

This line has the highest priority level, KERN_EMERG, and will attempt to broadcast to all consoles.

There are a few key points here:

  1. "not syncing": This is a classic phrase in kernel Panics. What it means is: I know there's a bunch of data in memory that hasn't been flushed to disk, but I'm intentionally not doing it. Why? Because at this point, the system state is already corrupted. Forcing disk writes could crash the filesystem and cause data corruption. Choosing the lesser of two evils, it abandons synchronization.

  2. Custom message: The whoa, a kernel panic! ... that follows is the argument we passed to panic().

Next comes a large snapshot of the system state:

  • Process context: It's definitely insmod that messed up here.
  • Taint flags: Whether the kernel has been tainted by non-GPL modules, etc.
  • Kernel version.
  • Hardware details.
  • Call stack: If CONFIG_DEBUG_BUGVERBOSE is enabled (it usually is), the kernel will call dump_stack() to print out the path it took to reach its demise. This is the most important clue.
  • Registers and instruction pointer: RIP value, machine code, CPU register snapshot.
  • KASLR offset: If your kernel has address randomization enabled (standard for modern kernels), this tells you the offset when the kernel image was loaded.
  • Closing error:
---[ end Kernel panic - not syncing: whoa, a kernel panic! myglobalstate = 0xeee ]---

We already covered the in-depth interpretation of these details in Chapter 7's Devil in the details – decoding the oops, so we won't repeat that here.


Diving Inside the panic() Function

Let's take a look at the source code. Where do all these outputs come from? In the 5.10.60 kernel, the beginning of the panic() function looks like this:

void panic(const char *fmt, ...)
{
static char buf[1024];
va_list args;
[...]
pr_emerg("Kernel panic - not syncing: %s\n", buf);
[...]
}

This is an exported symbol, which is why our module can call it directly.

The logic is clear: first parse the incoming format string into buf, then immediately shout it out using the highest priority, pr_emerg.

After shouting, what then? The kernel enters an infinite loop.

But before the infinite loop, it still has to do some "end-of-life care."

1. Emergency Information Dump (panic_print_sys_info)

The kernel has a boot parameter called panic_print, which is a bitmask. The default is 0, meaning nothing extra is printed beyond the basic Panic information.

But you can adjust it. For example, if you want to see all task states, memory information, timer states, lock states, etc., you can add panic_print=0x3f to your boot parameters (see Table 10.2 for the meaning of each bit). This is very helpful for debugging certain special deadlock issues.

At the very end of the code, the kernel stops in an infinite loop, but within that loop, it periodically calls an architecture-specific function, panic_blink().

On x86, this is hooked into the keyboard driver (drivers/input/serio/i8042.c). It makes the keyboard LEDs blink frantically.

The purpose is simple: if you're using a graphical interface (X Window) and the screen freezes, you can't tell if the GPU crashed or if the kernel actually Panicked. Seeing the keyboard lights blink tells you: the kernel died, not the monitor.

Here is the actual code at the tail of the panic() function, right after the final message is printed:

pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
/* Do not scroll important messages printed above */
suppress_printk = 1;
local_irq_enable();
for (i = 0; ; i += PANIC_TIMER_STEP) {
touch_softlockup_watchdog();
if (i >= i_next) {
i += panic_blink(state ^= 1);
i_next = i + 3600 / PANIC_BLINK_SPD;
}
mdelay(PANIC_TIMER_STEP);
}

Note the suppress_printk = 1 here. This prevents any subsequent logs from scrolling the critical diagnostic information we just printed off the screen. The system is completely dead at this point, so you definitely can't scroll up to read the logs. We must lock the screen on the most important frame.

3. Other Escape Routes

The default behavior of panic() is to hang, but it has a few "detours":

  • kexec/kdump: If you have kexec enabled in your kernel configuration and a crash kernel set up, when a Panic occurs, the kernel won't just sit in a dumb infinite loop. Instead, it will immediately boot another backup kernel. That backup kernel's sole mission is to dump the current main memory data to disk. This is the standard crash capture solution in the server domain.

  • Panic Notifier Chain: The kernel allows other modules to register a "panic notifier chain." When a Panic occurs, these callback functions are triggered to perform specific cleanup or logging operations. We'll play with this ourselves in the next section.

  • panic=n parameter: If you pass panic=10 in your boot parameters, it means the kernel will automatically reboot 10 seconds after a Panic. This is common on unattended embedded devices—reboot immediately after a crash and see if it can self-heal.


Quick Reference: Kernel Parameters and Configuration

Finally, for easy reference, here are the main parameters and configurations that affect Panic behavior.

Table 10.1 – Panic-Related Kernel Parameters, Sysctl Tuning Knobs, and Configuration Macros

Here are just the most critical ones:

Parameter/FileTypeDescription
panicBoot parameterSets how many seconds after a Panic to automatically reboot. 0 means hang forever.
panic_on_oopsBoot/SysctlWhen set to 1, any Oops will immediately escalate to a Panic. This is useful on certain critical business systems (better dead than wrong).
panic_printBoot parameterBitmask controlling how much extra information to print during a Panic (see table below).
panic_on_unrecovered_nmiSysctlWhether to Panic on an unrecoverable NMI (Non-Maskable Interrupt).
panic_on_io_nmiSysctlWhether an IO-generated NMI triggers a Panic.
panic_on_warnSysctlUse with caution. If set to 1, any kernel WARN_ON() trigger will cause a Panic. This is extremely useful during development, helping you catch hidden dangers that would otherwise be ignored.

Table 10.2 – panic_print Bitmask Breakdown

This parameter determines what extra information you want to see beyond the stack trace.

BitMacroPrinted Content
2PANIC_PRINT_ALL_CPU_BTStack traces on all active CPUs (not just the current CPU).
4PANIC_PRINT_TASK_INFOPrint the state of all tasks (like pressing SysRq-t).
16PANIC_PRINT_TIMER_INFOPrint kernel timer information.
32PANIC_PRINT_LOCK_INFOPrint the state of all locks (if lock-holder information is available).

Hands-On: Verifying panic_print

Now that we understand the theory, let's verify it hands-on.

Exercise 10.1: Adjust panic_print, reload the letspanic module, and see what extra information appears in the logs.

  1. Modify your kernel boot parameters to include panic_print=0x3f (or whatever bits you want to test).
  2. Reboot the VM.
  3. Ensure the netconsole receiver is ready.
  4. insmod ./letspanic.ko.

You should see a massive amount of extra system state information on the receiver side. This is often the key clue for inferring the cause of a crash.

Alright, now that we fully understand the kernel's behavior during a Panic, it's time for something advanced—how to inject our own code when a Panic occurs (such as notifying external hardware).

In the next section, we'll implement a custom Panic handler.