Hacker News new | ask | show | jobs
by mbrubeck 2281 days ago
Panicking is, in many ways, the best-case scenario for code that contains a bug. A bug that causes a panic is much easier to find through fuzzing, property testing, even static analysis or analyzing failures in production. It also prevents the bug from "infecting" other code by allowing the program to proceed in an invalid state.

If you don't want a panic to take down the whole system, you can isolate the code in a thread and use a supervision tree, or use `catch_unwind` to let the thread perform cleanup and then continue from a known state.

Using `slice.get()` and returning `Option` or `Result` for failures that should never happen in correct code is not an improvement. It leads to the same unwinding behavior, as the failure gets passed up the call stack, but with more manually-written code, and more error-handling paths that are hard or impossible to test (because if the program is correct then they are unreachable). It infects calling functions and changes public APIs.

Clearly, the best practice is "don't write code with bugs." If your code is bug-free, then it is also panic-free. But we don't really have the tools and techniques to do that all the time in general. The second-best option may be to write code that panics when there is a bug. (People saying "write code that can't panic" are often really just saying "don't write bugs.")

(See also "crash-only software," from the Erlang school of reliability engineering.)

3 comments

> If you don't want a panic to take down the whole system, you can isolate the code in a thread and use a supervision tree, or use `catch_unwind` to let the thread perform cleanup and then continue from a known state.

Playing devil's advocate: with unwinding panics (which are necessary for these two approaches), it's harder to make sure all the data structures the thread was using are left in a coherent state. It's not as bad as exception safety in C++, but it does have some similarities. Just take a look at the tricks the Rust standard library uses to keep everything sane even if the stack unwinds (structs implementing Drop normally called SomethingOnDrop, for instance CopyOnDrop), or std::sync::Mutex poisoning.

Agreed. There are definitely good arguments for using abort-on-panic, and doing isolation and recovery at the process level rather than the thread level. This is what we do with all the Rust code in Firefox, for example.

Unwind safety is a real issue. I have some personal experience with fixing panic-safety issues in unsafe Rust code:

https://github.com/servo/rust-smallvec/pull/103

I wrote a bit more about it here:

https://users.rust-lang.org/t/c-pitfalls-hard-to-avoid-that-...

Even isolation and recovery at the process level does not guarantee that a multi-process application will have no invalid state! By the time invalid state is detected in one process and a panic happens, the invalid state may have already propagated to another process via IPC messaging.

And even taking down an entire multi-process application doesn't fully protect against invalid state, if that invalid state has wound up in persistent storage.

All recoveries from panics are ultimately heuristics.

Your goal should be for your software to behave correctly regardless of persistent state. That is, all persistent states are valid, or else that's a grave bug.

If the situation is "Program fails when restoring from state X" the priority for "Don't fail when restoring from state X" is higher than for "Don't cause whatever happened that results in state X".

Example: Let's say your code believes 'foo' is supposed to be a file with an XML structure in it. A user reports that something went wrong and now the program crashes, the file 'foo' is now exactly 4096 bytes (one page on most architectures) of binary noise.

Correct priority: #1 Make the program work when 'foo' is not an XML file. If possible (maybe 'foo' is storing animated profile pictures for a chat program) it should carry on, if that's impractical (maybe 'foo' defines which model of X-ray machine we're hooked up to, best not to press on without knowing) it should give a clear error explaining what's wrong.

#2 only after fixing #1 figure out how 'foo' gets corrupted and try to solve that.

Nitpick: this isn't really about persistent state specifically, but any shared (and especially mutable) state, including but not limited to state shared across multiple temporally-nonoverlapping instances of the same program. Eg, compare when 'foo' is provided as input or fetched across a network.
> if that invalid state has wound up in persistent storage.

The same can be said for a program recovering from a power loss?

> much easier to find through fuzzing

Note that you can fuzz with debug asserts enabled.

Crash-only software is a fantastic paper that teaches another school of thought: always panic gracefully and be quick to recover. This assumes that you have a quick recovery plan for panics, if you panic, which is then fine (but not always the case).

> Panicking is, in many ways, the best-case scenario for code that contains a bug.

The very opposite: it's among the worst behaviors, short of creating a vulnerability.

From any modern languages we should expect an exception to be raised.

"Panic" is essentially the Rust name for exceptions. They are even implemented the same as C++ exceptions in terms of codegen. The real differences are cultural rather than technical, as I touch on in this thread:

https://users.rust-lang.org/t/c-pitfalls-hard-to-avoid-that-...

No, it does not have the same semantics of exceptions.
Care to give an example or two?

Rust panics, like C++ exceptions, can unwind the stack and invoke Drop/destructors.

Rust panics, like C++ exceptions, can be caught, inspected, and recovered from - using std::panic::catch_unwind instead of try/catch statements.

Rust panics can be rethrown with std::panic::resume_unwind, just as you can rethrow C++ exceptions with "throw;".

Rust panic payloads can be a wide variety of types - anything that conforms to "dyn Any + Send + 'static" - just as you can throw a wide variety of exception types in C++. While Rust panic payloads are typically a &str or String, they're not limited to that, and C++ lets you throw C-strings or std::string too.

Unhandled Rust panics terminate the application, unhandled C++ exceptions std::terminate (which by default invokes abort) the application.

It's discouraged to use Rust panics for general control flow, but that's cultural rather than semantic - you can totally use them for control flow. Nothing is stopping you, except hopefully your code reviewer.

Rust panics can be configured to abort instead of unwinding, but that's just a cleaner alternative to C++ compilers typically giving you the option to disable exceptions entirely - and it's not unheard of for a C++ library to wrap exception throwing with macros, such that these can abort instead when built without exception handling support.