Skip to main content

1.9 The Zen and Intuition of Debugging

In the previous section, we spent a fair amount of time "sharpening our tools"—building production and debug kernels. Now, you have two different blades in your toolbox.

But before you start swinging, we need to talk about mindset.

This section won't cover specific commands or configuration files. Instead, we'll discuss how to think when you're staring down a baffling Bug.


1.9.1 The Boundary Between Science and Art

Let me start with a reality check: debugging is not just a technical skill—it's an art.

That might sound fluffy, but if you've ever found yourself staring at code on a screen until you start questioning your life choices, you know exactly what I mean. It's a science because it requires rigorous logic to reproduce and locate problems; it's an art because the "eureka moment" that solves the problem often comes from experience and intuition, not documentation.

You might find the following advice a bit cliché.

And honestly, none of these principles are new. The strange thing is, under pressure, we tend to completely forget these simplest rules and charge headfirst into a dead end.

So, treat this as a halftime break. Take a breath, and then get back in the game.


1.9.2 The First Commandment: Make No Assumptions

Churchill famously said: "Never, never, never give up."

Our version is: "Never, never, never make assumptions."

Assumptions are the root of countless Bugs. Think back to those "epic fails" we discussed at the beginning of this chapter—which one wasn't caused by a subconscious, incorrect assumption made by the designer or programmer?

Here's a slightly crude but incredibly accurate wordplay (just kidding, but it's true):

Look at the word assume (assume)—it's made up of three parts: ASS, U, and ME. In other words, making an assumption turns U and ME into an ass (ASS).

If you don't want to make an ass of yourself (and your teammates), stop assuming "this part should be fine."

How do you fight assumptions? Let the code speak.

Using assertions in your code is the best way to catch assumptions.

  • In user space, you use the assert() macro (check its man page, it's quite powerful).
  • In the kernel, we have dedicated macros for this (we'll dive deep into BUG(), WARN(), and VM_BUG_ON() in the kernel in Chapter 12—consider this a teaser).

1.9.3 Don't Get Lost in the Bushes, Look at the Forest

Sometimes, code paths are as tangled as a bowl of spaghetti. When you're deep in a maze of if and else, it's easy to forget "what is this chunk of code actually trying to do?"

This is called "not seeing the forest for the trees."

The moment you realize you're stuck, force yourself to zoom out.

Stop and ask yourself: What is the macro goal of this code? What should its inputs and outputs be? This "stepping out of the frame to look at the picture" mindset often helps you spot the flawed assumption causing the error.

In moments like these, well-written documentation is a lifeline. This is why I always emphasize: don't be lazy, write documentation.


1.9.4 Make the Problem Smaller

When you hit a tricky Bug, try this tactic:

Build a minimal, reproducible scenario.

Strip away all irrelevant code, configurations, and dependencies. Keep only the core few lines needed to reliably reproduce the Bug. The process of doing this is often本身就是追踪根本原因的过程。

What's even more interesting (and I've experienced this many times) is that when you try to "distill" the problem and write it down—even during the process of documenting the issue—your brain suddenly goes "ding!":

"Wait, if it works like this here, then over there..."

And you find the answer before you even run the code.


1.9.5 Debugging Takes More Brainpower Than Writing Code

Brian Kernighan said something in The Elements of Programming Style that has been quoted countless times:

"Debugging is twice as hard as writing the code in the first place."

If you don't go all out when writing the code, you'll pay for it double when debugging.

The core advice here is: don't rush to write code; lay the foundation first.

  1. Write a brief high-level design document.
  2. Write down what you expect the code to do (high-level abstraction).
  3. Only then worry about the details (the so-called low-level design document).

Good documentation saves you time. Trust me, your future self will get down on its knees and thank your present self.

This reminds me of another famous quote:

"An ounce of design is worth a pound of refactoring." — Karl Wiegers


1.9.6 Zen and Beginner's Mind

Sometimes, the code is a tangled mess of spaghetti that just smells wrong.

If you can still start over, deleting it and rewriting it might actually be the most efficient path. This is a form of "beginner's mind."

But "beginner's mind" has another layer of meaning: temporarily let go of your ego.

"I wrote this code, how could it be wrong?" "This logic is perfect, it must be a compiler bug."

These thoughts are the arch-enemies of debugging. You need to try looking at the code and the environment through the eyes of a complete stranger.

This is also why Code Review is so effective—your colleagues don't have your psychological blind spots and can spot the Bugs you've been looking right past at a glance.

Of course, there's one infallible secret technique: go to sleep.

Seriously, don't force it. Often, two hours of grinding late at night is no match for ten focused minutes after waking up.


1.9.7 The Art of Naming and the Measure of Comments

I once saw a discussion on Quora: What is the hardest thing for a programmer?

The top-voted answer was actually—naming variables.

It sounds funny, but the more you think about it, the truer it gets. Variable names are sticky; once set, they follow you for a long time.

  • int i as a loop index? Great.
  • int theloopindex? That's a bit pretentious and strains the eyes.

How do you strike the right balance?

  • Names: Should clearly express intent, but don't be overly verbose.
  • Comments: Are for explaining "why it's designed this way" and "the logic behind the code," not for explaining "how this line of code works."

Any competent programmer can figure out what a = b + c is doing; they don't need your comment for that. But no one knows why 1 was added here—and that's when a comment is a lifeline.


1.9.8 Don't Ignore the Logs

This sounds like a no-brainer, but under high pressure, we often overlook the most obvious things.

Carefully check the kernel logs (and even application-layer logs). Logs usually support reverse chronological ordering (dmesg can do this, or use journalctl), which lets you see at a glance exactly what happened right before the disaster struck.

Linux's systemd provides journalctl(1), which is a godsend. If you aren't proficient with it yet, go learn it now. It will repay you.


1.9.9 The Brutal Truth About Testing

Here is a brutal truth:

"Testing shows the presence, not the absence of bugs."

— Edsger W. Dijkstra

But that doesn't mean we should give up on testing. On the contrary, testing and QA are the most critical parts of the software process, and ignoring them comes at a massive cost.

Spend time writing thorough test cases—both positive and negative. This pays off handsomely in the long run.

  • Negative testing and Fuzzing are crucial for exposing security vulnerabilities.
  • Code coverage analysis: Don't just rely on gut feeling to say "it's tested." Let the tools do the talking. 100% coverage (combined with runtime testing) is the goal.

We'll dive deep into kernel code coverage tools and testing frameworks in Chapter 12. For now, just remember the takeaway: don't be lazy.


1.9.10 Technical Debt

Sometimes, you look at the code you wrote and feel a vague pang of guilt:

"It works, but it's not written well... some edge cases aren't handled... but it's just a small case, it should be fine, right?"

With a deadline looming, the temptation to just check it in is huge.

Resist it.

There is something in this world called "technical debt." It's like a credit card: you can overspend now (write bad code), but the interest (future maintenance costs) will be so high it bankrupts you. This debt will have to be paid eventually, and the debt collectors will come knocking.


1.9.11 Those Stupid Mistakes

If I got a penny every time I made a rookie mistake, I'd be a rich man by now.

Here's a true story: I once spent the better part of a day tearing my hair out debugging a C program that just wouldn't work. The logic was fine, the code was fine...

Until I realized I was editing the correct code, but compiling an old version—because I was running make in the wrong directory.

(I believe you've had those moments where you wanted to smash your keyboard too.)

At times like this, the best thing to do is step away from the computer, go drink a glass of water, or go to sleep. Your brain is tired.


1.9.12 The Empirical Model: Don't Trust Books, Trust Experiments

Figure 1.6 – Be empirical!

The word Empirical (empirical) means: verifying things through observation and experience, not through theory.

So, don't trust books (except this one, of course), don't trust tutorials, don't trust blogs, and don't trust so-called experts—including me.

Try it yourself.

See the results with your own eyes.

Years ago, on my first day at a new job, a colleague sent me a document that I still treasure today: The Ten Commandments for C Programmers, by Henry Spencer.

Though a bit clunky, it inspired me to put together a quick-reference checklist for you.


The Seven Commandments for Programmers

Crucial! Run through this checklist before every commit:

  1. Check for failure in all APIs. Don't just write the success path. All system calls and library functions can fail. Handle it.

  2. Compile with all warnings enabled. At minimum, add -Wall. Ideally, add -Wextra, or even -Werror (treat warnings as errors—this is how kernel code is built). Eliminate all warnings.

  3. Never trust input. Especially user input. Validate it. Then validate it again.

  4. Eliminate dead code immediately. Useless code, commented-out code—delete it. Leaving it around is a hazard.

  5. Test thoroughly. 100% code coverage is the goal. Spend time learning those powerful tools: memory checkers, static/dynamic analyzers, security checkers, fuzzers, code coverage tools... don't neglect security.

  6. Hardware lies too. For kernel and driver developers, once you've ruled out software issues, don't forget that faulty peripheral hardware could also be the culprit. Don't easily rule out this possibility! (You'll understand once you've been burned).

  7. Make no assumptions. Remember that word ASS U ME. Use assertions to catch assumptions, thereby catching Bugs.

We'll return to these rules repeatedly in the chapters ahead.


(Chapter summary omitted — handled by another Agent)