Hacker News new | ask | show | jobs
by eszed 863 days ago
This is a great mystery story, with a satisfying ending. And this

> I generally start troubleshooting an issue by asking the system what it is doing," explained Zimmie. "Packet captures, poking through logs, and so on. After a few rounds of this, I start hypothesizing a reason, and testing my hypothesis. Basic scientific method stuff. Most are simple to check. When they're wrong, I just move on. As I start narrowing down the possibilities, and the hypotheses are proven, it's electric. Making an educated guess and proving it's right is incredibly satisfying.

is an approach every every one of us should internalize.

3 comments

Binary search (or bisecting) is also an incredibly valuable approach that I don’t see junior and intermediate engineers reach for nearly as often as they should.

When some thing is failing, find a midpoint between where things are working and where the bug is manifesting. Do you see evidence of the bug? If not, look earlier in the pipeline. If so, look later. Repeat.

In my experience this process is the primary distinguisher between those who flail around looking for a root cause and the people who can rapidly come to an answer.

Good call. When you've got no idea where to start, that's how to start.

Mostly, though, I think people "flail" because they don't know the pipeline well enough to even do that. I know I've been in that position before, when approaching completely new (to me) systems. (Sometimes there isn't someone more knowledgeable you can ask!) That's where I find hypothesis -> test -> refine particularly useful. You're still wrong far, far more often than you're right, but it stops feeling like flailing, and more like making progress towards understanding the system well enough to apply other techniques (whatever they might be) more smartly.

`git bisect` is one of those things I wish I'd internalized sooner in my career. It can be so incredibly powerful, especially when you just hand it a shell script (`git bisect run`) and let it rip without having to guide it by hand.
Once someone understands a complex system well enough to find a good midpoint, are they still a junior engineer?
I use this technique all the time to help people who are stuck with problems using software - often cause by bugs. Divide-and-conquer quickly isolates the issue. I try to share the technique when I use it, or just offer it as a suggestion.

Part of why it's so useful is you hardly have understand anything about the system internally. Just reduce the complexity of what you're doing until it works to find the lower bound if you don't already have a working case.

That random guessing is like gambling - you hope for a big quick payout but when your hypothesis fails, you end up worse off than before. Wasted time and no closer to the solution.

100%

I've wondered why this isn't second nature to engineers, junior or otherwise.

Maybe they don't really understand the pipeline? ("I enter the value in the web form and it just appears in the database.")

I think I kind of internalized that idea from my early soft eng courses; after seeing how efficiently a computer can find a result by cutting the set in half repeatedly, I've tried to apply that approach elsewhere when it fits.
I like that term. I always called this "divide and conquer."
I remember one of the Car-Talk guys use the term "binary chop" once when talking to a caller about diagnosing a problem.
> I don’t see junior and intermediate engineers reach for nearly as often as they should

Or senior engineers

That's just how you debug any system.

If you're on this site and haven't already internalized it...how do you debug?

How do you debug if this isn’t what you’re doing? I’m genuinely curious… are you using some sort of advanced tools like a psychopath?
1. Internet search 2. Make random changes 3. Test 4. Start over.
I can't even imagine how that would work with any complicated system.
Not well. It's pretty frustrating to observe its practitioners in the wild.