| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ragnese 546 days ago

You're not far off. This is one of my favorite topics in programming language design discussions, and I have opinions that some may even say are "controversial". For what it's worth, I've been writing Rust in production since 2016 (not 100% of my time since then, but I've had a good amount of experience with some decently long-lived projects of varying complexity).

First, I assert that Java's checked exceptions are a solidly good feature. Of course it has flaws. The whole rest of the language is also full of flaws, so that's not surprising.

Second, I assert that there are two things that have caused the vast majority of hate toward Java's checked exceptions: programmers not being taught/shown how and when they're intended to be used, and that oft-circulated interview transcript from 2003 where Anders Hejlsberg asserts that checked exceptions are language design "dead end". I don't think he was right in 2003, and I especially don't think the opinion is correct today in light of how much strong static typing has really gained favor with the programming community. But, that opinion really took off and we spent years and years seeing that assessment repeated as a truism, which I think is why it took so long to finally start experimenting with statically typed failure modes again (e.g., Rust and Swift).

Now, here's where I'll get controversial about Rust error handling. I'll try really hard to keep this from turning into an entire dissertation, but I'll elaborate if anyone asks.

It is often a mistake to implement the `From` trait for error types and use the `?` operator everywhere. Error types in an API need to be aware of the context in which they occur, so just converting by type only often doesn't make sense. You may encounter a `FooError` type while your app is doing totally different things, so it's likely that not every `FooError` occurrence means the same thing to whoever is calling into your code. Also, sometimes you can actually handle an error, and getting into the muscle memory habit of just tacking `?` on to everything can lead to mistakenly propagating errors that you might have better handled by doing something else (including perhaps panicking).

There does seem to be a trend toward automatically adding stack traces in Rust errors. This is completely misguided, IMO. And this may be my MOST controversial opinion: stack traces almost *never* belong in a `Result<>` error type. Result types should be relevant to your "domain" (borrowing the term from "Domain Driven Design" even though I do NOT advocate for DDD in general).

Think about it this way: designing an API is about abstraction. So if you write a integer division function that takes two arguments and divides them, it might return `Result<i64, DivideByZero>`. If the caller passes in a 0 divisor, then what business is it of theirs to see what your private functions are called, how many of them are called, and what line of your file they were defined on? That's the leakiest of leaky abstractions.

You might be thinking: "But, if I see an result/error value that I didn't expect while running my program, the stack trace will help me track down the issue!" Yeah, no kidding. So, let's also start adding stack traces to our successful values, too! After, all, if I call my division function and get back a `Result::Ok` with a weird number that I didn't expect, I might want to trace that back, too, right? (This suggestion is sarcastic to prove a point. It should, hopefully, sound ridiculous to add stack traces to every return value from every function.)

The issue is that Rust's Result (and Java's checked exceptions) require a different paradigm. A Result is in the type signature because it's part of your domain's API design. It's just values. It's not *for* debugging. You use a debugger for that or programmatically panic when something is truly unexpected and get the stack trace from that.

Which leads to the corollary to the previous controversial opinion: Rust has unchecked exceptions; they're called panics and they are 100% *okay to use* in the vast majority of applications that the vast majority of day-job programmers work on.

Obviously, context matters, and there are some places where panicking is unacceptable. But, Result is for expected domain failures. Panics are for programmer errors and unrecoverable constraint violations. And I'm not advocating for panics to be "lazy". Rust code that refuses to ever panic (as far as they know, but I hope they aren't indexing any vecs/arrays just in case!) usually leads to overly polluted error types where it ends up being difficult to understand what errors are actually meaningful and what errors are never actually going to happen. Instead of inspecting errors and figuring out which to handle and how, I've seen things just snowball into a giant mess of nested enums with sometimes redundant error "branches" and missed opportunities to actually handle some cases. If you, as the programmer, know for sure that you just added something to a HashMap earlier in your function and you know you didn't remove it, then for the love of all things sacred, just write `map.get("my-key").unwrap()` (or `.expect("message")`--whatever) instead of making the caller have to consider an error that will never happen, is not their fault, and that they can't do anything about!

And, if you do have a situation where panicking is unacceptable (you must be using `#![no_std]`, right??), then don't make a bunch of different error types for all of the possible programmer bugs. Just make a single umbrella `FatalError` type and use that.

For further reading, I really like this piece from the book Real World OCaml, which also has a Result type and exceptions: https://dev.realworldocaml.org/error-handling.html. Specifically, the very last section at the bottom of the page, titled: "Choosing an Error-Handling Strategy". (The old version of that page used to be more plain HTML and the sections had anchors so I could link directly to that section...)

And for further reading about error handling strategy in a no-panic context, I really like the approach described here: https://sled.rs/errors

4 comments

ekimekim 546 days ago

> Result is for expected domain failures. Panics are for programmer errors and unrecoverable constraint violations.

The problem is that "unrecoverable constraint violations" happen a lot in practice when you're dealing with filesystems, networking...anything that isn't pure computation.

Suppose I have a function that calls other functions that themselves make 3 database queries, two HTTP requests, and reads/writes from a cache directory. It considers all of them (except perhaps the caching) unrecoverable in the context of that function. What should it do?

I see three reasonable options:

(1). return a simple error type saying "Networking failure", "IO Error", etc if any of those fail

(2). return a complex error type that exposes the internal details of all the different things it's doing and which one failed and why

(3). panic if any of them fail

I would argue that (1) is unfit for purpose as you have no idea what's actually going wrong.

And (3) is currently very heavily discouraged, though I think if I'm understanding your argument right it probably makes the most sense. However it leaves your top-level function in the awkward position of needing to make that panic part of its API contract, without the type system to help. It's also highly limiting because the caller now can't distinugish between programmer errors and possibly-transient environmental conditions like a service outage.

(2) is what I'd expect to see in practice right now, and that's what leads to these automatic stack traces, etc. But none of these feel like good options. Ideally I'd want something that is:

- Debuggable (like (2) and (3))

- Part of the type system (like (1) and (2))

- Still allows introspection by the caller (like (1) and (2))

- Doesn't require a ton of boilerplate at each level (like (3), and possibly (1))

(edited for formatting)

link

kelnos 546 days ago

No, I don't think you understood the GP's argument. Network and filesystem errors are not always "unrecoverable constraint violations". They're often just simple errors -- things that you should expect to happen, even -- and your (1), or, better, (2), are the most appropriate reactions to those.

"Unrecoverable constraint violations" occur, for example, when you've done a sanity check on some data structure and found that it's in a state that should be impossible, and so continuing from there is unsafe.

Even then, you may choose to handle them in a better way than simply aborting the program. For example, if I'm writing a HTTP service that is backed by a database, and I get a customer request that results in me finding that a column in the database is NULL when it shouldn't be, I'll probably just return a 500 error to the customer rather than panic!(). The assumption is that even though there's a problem with this particular data, that might be the result of an almost-never-hit edge case, and we can still serve other customer requests just fine.

Sure, a simple single-user command-line application may choose to panic!() if a critical data file can't be opened from the filesystem. Maybe that is an "unrecoverable constraint violation" sometimes. But I think there's a lot of nuance you're missing.

link

ragnese 545 days ago

It's fun to think through examples like this. But, of course, we need to exercise caution because so much is dependent on the specific contexts of each individual project.

First, I will say that I probably misspoke (mistyped...?) by using the word "unrecoverable". At the end of the day, it's not even really about whether or not something is recoverable, but it's really just about whether the caller might "want" to be aware of it and how much detail the caller needs.

For your example, you end up writing,

> It's also highly limiting because the caller now can't distinugish between programmer errors and possibly-transient environmental conditions like a service outage.

That's the giveaway that the caller needs to know about service outages, specifically. So, you need to handle your HTTP requests and/or database queries in such a way that you can incorporate some of the failures into your function's error type.

But, you SHOULD NOT just implement `From` for converting all of your database library's errors into your function's error type. You have to actually inspect the error returned from the database and return an appropriate error. Specifically, if you're using a SQL db library, it might return an error if your query generating invalid SQL statements--that should be a panic because that's not a "service outage", that's a programmer bug in the implementation of the function. Likewise, an auth error is not the same as an outage. If the db library specifically returns an error that it can't make a connection, then that's the one you'd want to wrap in your error type in this example.

But, again, it all depends on exactly what kind of project we're working on. Your example of doing HTTP, and filesystem, and database queries reminds me of Firefox. Firefox obviously does HTTP stuff, and it uses the filesystem and a SQLite database for settings or configs or something... So, if we were talking about your example function in the context of writing a web browser, then a failed HTTP request is 100% normal and expected because the user's device might connect and disconnect from the internet at any time. So, HTTP failures should be represented in the function's signature. However, since the SQLite database is basically part of the application, itself, any errors when trying to query it are probably panic-worthy. Phrased differently: it's a working assumption of the application that the database is always accessible, so there's no reason to describe failure modes that aren't supposed to ever happen. If the database ever became inaccessible, the top-level main function should catch all panics, log something about them (maybe send off telemetry data, etc), and warn the user that an unexpected error occurred and either tell them to restart the app or just kill ourselves, etc.

Have you ever written a function that returned a `String`? Or a `Vec`? Well, those require memory allocations and they may fail and panic. But, I've never worked in a context where it made sense to try to catch those panics and change those function signatures into `Result<String, OOM>`. My applications choose to assume that enough memory will be available, and I've made the decision to allow the apps to crash and burn if that assumption ends up violated rather than add the large burden of carefully handling that possibility in every line of code in these projects. And, so far, that has been the right call because none of my Rust projects have ever OOM'd yet (and some have literally been running in production for multiple years), and there's really nothing I would want to specifically do if they did-- I'd either figure out how to reduce the memory requirements or increase the server's memory.

link

kelnos 546 days ago

> First, I assert that Java's checked exceptions are a solidly good feature.

I agree in theory, but I think they're very poorly implemented, and the syntax and tooling around handling them is terrible. And, frankly, those flaws (yes, I agree everything has flaws) make the overall feature mostly useless, unfortunately. It really doesn't matter where you think all the hate comes from; the hate is there, and it means that very few people use checked exceptions, except for where they're required to when stdlib methods throw them. Ultimately that's all that matters. If no one uses the feature, then it's not a useful feature, regardless of the reasons.

> The issue is that Rust's Result (and Java's checked exceptions) require a different paradigm. A Result is in the type signature because it's part of your domain's API design.

Correct, but in Java, checked exceptions are also a part of the API and ABI, so there's really little difference there, outside of ergonomics. (Which IMO are one of the most important parts!)

> (This suggestion is sarcastic to prove a point. It should, hopefully, sound ridiculous to add stack traces to every return value from every function.)

I don't think that proves a point. Sure, you can argue every proposal into absurdity; it doesn't make the suggestion itself bad.

> Rust has unchecked exceptions; they're called panics and they are 100% okay to use* in the vast majority of applications that the vast majority of day-job programmers work on.*

Yes, and this really bothers me. I wish more people would annotate their functions with `#[no_panic]`. Actually, I wish that was the default, and if you want to write a function that panics or calls functions that can panic, you need to annotate the function with `#[can_panic]`, and the compiler should enforce that, and `rustdoc` should surface that in all documentation.

link

ThatGeoGuy 546 days ago

You might be thinking: "But, if I see an result/error value that I didn't expect while running my program, the stack trace will help me track down the issue!" Yeah, no kidding. So, let's also start adding stack traces to our successful values, too! After, all, if I call my division function and get back a `Result::Ok` with a weird number that I didn't expect, I might want to trace that back, too, right? (This suggestion is sarcastic to prove a point. It should, hopefully, sound ridiculous to add stack traces to every return value from every function.)

I don't think I disagree with the ends you're proposing (don't add stack traces to every value, don't add stack traces specifically to Result::Err(E) variants); however, this is a bad way to justify it. Tools like dtrace / bpftrace do exactly this kind of stack tracing for both success and error cases across entire systems. This is a good thing™, and is actually very useful for both debugging, performance profiling, and understanding what your code is really doing on the hardware.

So I guess I disagree with how you're framing it. I would argue that adding stack traces to every value in Rust would be bad because it is a lot of overhead for something your kernel can and will do better.

The issue is that Rust's Result (and Java's checked exceptions) require a different paradigm. A Result is in the type signature because it's part of your domain's API design. It's just values. It's not for* debugging. You use a debugger for that or programmatically panic when something is truly unexpected and get the stack trace from that.*

This really is the gist of it. However, I will say that in my experience the reason that Result types are nice (over e.g. exceptions) is that putting the error cases in the type contract means that you can have the compiler check when someone hasn't handled an error case (? and unwrap are "handling" it even if they may not always be appropriate), as well as statically verify which variants may be unused. One very frustrating thing I've had to encounter in C++ is finding a whole list of different errors that have been duplicated as multiple different opaque (e.g. behind a unique_ptr<std::exception> or some such) exceptions across the codebase.

Being able to know what variants of error can come out of an API is great! It just happens that working with a rich type system like Rust makes it possible to do all manner of things that languages-with-only-exceptions cannot.

link

ragnese 546 days ago

Yeah, fair point about dtrace, et al, but I think my statement is still fine in context, since we're specifically talking about these Rust libraries that collect stack traces for error types.

And I agree and love having statically checked failure modes! So, if you're choosing to panic in Rust, it better be because of something that is really not able to be handled at all (caveat: the top-level event loop or whatever could catch panics/exceptions, print a "Oops! Something went wrong!" message to the user and then either die or try to keep going, etc, but no handling panics/exceptions in "middle" layers.).

link

agos 546 days ago

characterizing people who think checked exceptions as either bad programmers or unable to have their own opinion on the matter does not do a great service to your argument

link

ragnese 546 days ago

Yeah, that whole statement there is probably unnecessary and I can see it being off-putting. I'll edit it if I still can.

However, I just want to make it clear that I wasn't intending to call anyone a "bad programmer". At least not in a personally insulting way. We've all been in a position where we were uninitiated at something. And most of us have been in a situation where we've jumped into a new programming language without having any kind of "formal" education on the design, philosophy, and intended best practices. For example, with Java, one should read documents like: https://docs.oracle.com/javase/tutorial/essential/exceptions..., especially this part: https://docs.oracle.com/javase/tutorial/essential/exceptions....

So, again, that part wasn't actually meant as an insult. We're all uneducated about many things at every point in our lives. And I think that lack of education or guidance on designing error types and handling has caused a lot of people to end up burying themselves in checked exception hell, and dismissing the whole thing because of that frustration.

The other part about cargo-culting... well, yeah, that was me insulting people.

link