| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mozdeco 1641 days ago

> code that can end up blocking forever should have a timeout and recover from that timeout happening.

There was no way for the calling code to do this. This was literally an infinite loop inside the network stack. Imagine the network stack itself going `while(1) {}` on you, without checking if the request was canceled.

Even if you detect that this happens, there is nothing you can do as the caller. You can't even properly stop the thread, as it is not cooperating. So recovering from this type of failure is hard.

2 comments

cxr 1641 days ago

> There was no way for the calling code to do this

Like what happened in a comment that I called out yesterday, you're silently inserting extra qualifiers that aren't in the original; the person you're responding to didn't say anything about calling code.

If the network stack can end up doing the equivalent of `while(1) { /.../ }`, then that's the bug, no matter what's in the ellided part. There's not "no way" to deal with this. (In the specific case of `while(1)`—which I recognize is a metaphor and not a case study, so onlookers should please spare us the sophomoric retort—it's as simple as changing to `while(i < MAX_TRIES)` with some failover checks.) In some industries, this sort of thing is mandatory.

link

marcan_42 1641 days ago

It's a bug. Are you saying there's some magical way of eliminating all possible infinite loops from code? Please write a paper on this amazing technique; I'm pretty sure that's equivalent to solving the halting problem and the computer science community would love to see a proven unsolvable problem being solved.

link

hollerith 1638 days ago

You write good comments usually, so IMHO this comment is worth replying to:

There is no algorithm that will determine the "halting status" of an arbitrary (program, input) pair, but that does not prevent a team of programmers from working in a subset of the set of all programs in which every program halts. Restricting themselves to that subset might make the team less productive (i.e., raise the cost of implementing things), but it probably does not materially limit what the team can accomplish (i.e., what functionality the team can implement) provided they're not developing a "language processor" (a program that takes another program as input).

link

cxr 1641 days ago

Your desire for your insolence to be noted is granted, but to answer the non-strawman form of your question: yes, there is a way to prevent infinite loops from making their way into software in the field. It means providing proofs that your loops terminate. (If you can't show this, your code has to be rewritten into something that you can come up with a proof for.) As I already said, this is mandatory in some industries. The philosophy is also not far off from the rationale for Rust's language design re memory management. And although it might seem like it requires it, there's no need for magic. This is something covered in any ("every"?) decent software engineering program.

link

throwaway984393 1641 days ago

I went and looked at the code (it's linked in the article). You absolutely can put a timeout around a case/switch statement. There's like 5 different ways to do it. And the code calling network syscalls can also have timeouts, obviously; otherwise nobody would ever be able to time out any blocked network operation. This is all network programming 101.

link

acdha 1641 days ago

If it's that easy, I'm sure they'd accept your pull request.

link