Hacker News new | ask | show | jobs
by mozdeco 1596 days ago
> code that can end up blocking forever should have a timeout and recover from that timeout happening.

There was no way for the calling code to do this. This was literally an infinite loop inside the network stack. Imagine the network stack itself going `while(1) {}` on you, without checking if the request was canceled.

Even if you detect that this happens, there is nothing you can do as the caller. You can't even properly stop the thread, as it is not cooperating. So recovering from this type of failure is hard.

2 comments

> There was no way for the calling code to do this

Like what happened in a comment that I called out yesterday, you're silently inserting extra qualifiers that aren't in the original; the person you're responding to didn't say anything about calling code.

If the network stack can end up doing the equivalent of `while(1) { /.../ }`, then that's the bug, no matter what's in the ellided part. There's not "no way" to deal with this. (In the specific case of `while(1)`—which I recognize is a metaphor and not a case study, so onlookers should please spare us the sophomoric retort—it's as simple as changing to `while(i < MAX_TRIES)` with some failover checks.) In some industries, this sort of thing is mandatory.

It's a bug. Are you saying there's some magical way of eliminating all possible infinite loops from code? Please write a paper on this amazing technique; I'm pretty sure that's equivalent to solving the halting problem and the computer science community would love to see a proven unsolvable problem being solved.
You write good comments usually, so IMHO this comment is worth replying to:

There is no algorithm that will determine the "halting status" of an arbitrary (program, input) pair, but that does not prevent a team of programmers from working in a subset of the set of all programs in which every program halts. Restricting themselves to that subset might make the team less productive (i.e., raise the cost of implementing things), but it probably does not materially limit what the team can accomplish (i.e., what functionality the team can implement) provided they're not developing a "language processor" (a program that takes another program as input).

Your desire for your insolence to be noted is granted, but to answer the non-strawman form of your question: yes, there is a way to prevent infinite loops from making their way into software in the field. It means providing proofs that your loops terminate. (If you can't show this, your code has to be rewritten into something that you can come up with a proof for.) As I already said, this is mandatory in some industries. The philosophy is also not far off from the rationale for Rust's language design re memory management. And although it might seem like it requires it, there's no need for magic. This is something covered in any ("every"?) decent software engineering program.
I went and looked at the code (it's linked in the article). You absolutely can put a timeout around a case/switch statement. There's like 5 different ways to do it. And the code calling network syscalls can also have timeouts, obviously; otherwise nobody would ever be able to time out any blocked network operation. This is all network programming 101.
If it's that easy, I'm sure they'd accept your pull request.