Hacker News new | ask | show | jobs
by quotemstr 3741 days ago
No custom allocator can make a task that fails to allocate gracefully report an error. Rust's error handling design is just terrible, and mostly a consequence of eschewing exceptions.

Had Rust opted for exceptions, it'd be a much better, and actually usable, language. Rust's terribly error-handling strategy is the chief reason not to use it.

3 comments

> No custom allocator can make a task that fails to allocate gracefully report an error.

Yes, you can. You can panic the thread, and you can recover from panics. But honestly this is never enough to gracefully recover from OOM. I can't think of any software that uses exceptions to gracefully recover from OOM (i.e. without crashing the process) in a way that works. How many Java applications do you see that catch OutOfMemoryError on a fine-grained level?

> Had Rust opted for exceptions, it'd be a much better, and actually usable, language. Rust's terribly error-handling strategy is the chief reason not to use it.

It's hyperbolic to call Rust's error handling strategy not "actually usable". I use it every day and never have any issues with it.

> How many Java applications do you see that catch OutOfMemoryError on a fine-grained level?

I wrote exactly that sort of code, in Java, last week.

I believe you that you do. That doesn't change the fact that few people actually do it or need to do it. There are many more people who think they need to recover from OOM but actually don't and would be better off if they didn't try. (There have been many security vulnerabilities that have resulted from attempting to handle OOM gracefully that wouldn't have been an issue if malloc just aborted the process.)
> That doesn't change the fact that few people actually do it or need to do it.

I am one of those few people. We are running some scientific code (currently, unfortunately, written in C++) on a heterogeneous bunch of compute nodes. Some computations can be extremely memory-intensive, and sometimes in ways that we didn't predict. So it's useful to be able to fail gracefully and record that computation X on node Y with input parameters [Z] failed specifically due to running out of memory at step W - so that e.g. the queue manager can try relaunching the computation on a beefier node or adjusting how many instances of which computation are allowed on Y.

That's something that Rust fully supports with panic handlers. Doing arbitrary work before the process goes down is useful and supported. (But you will have to be running Linux in a non-default configuration for it to be reliable, of course!)
Thanks, that's good to know!

> But you will have to be running Linux in a non-default configuration for it to be reliable, of course!

Of course. It does seem to work in practice with our C++ code, although probably that's due to our usage pattern.

I agree with you, but a general-purpose systems programming language needs to let _me_ make that determination. It can't abort on my behalf for my own good. It's depressing that Java, of all things, does a better job in this respect than Rust.

And we probably shouldn't be writing so much software in general-purpose systems programming languages.

> I agree with you, but a general-purpose systems programming language needs to let _me_ make that determination. It can't abort on my behalf for my own good.

You can decide. You can use the standard library and deal with exceptions, or you can not use the standard library and deal with malloc failure yourself. The Rust standard library is opinionated in this regard, because it's rarely ever a good idea to try to recover from malloc failure for userland processes.

That said, with recover, which will probably be stabilized, you can recover from malloc problems, which are turned into panics. But I'm sure you know that this can be unreliable on Linux with the default overcommit turned on, and so forth. https://doc.rust-lang.org/std/panic/fn.recover.html

Note all of the debate on the linked issue as to whether recover is a good idea. Most of the Rust community is very hesitant to even allow catching panics at all; they certainly don't find the current situation "unusable".

> It's depressing that Java, of all things, does a better job in this respect than Rust.

I think that Java shouldn't throw exceptions on OOM. It should just abort the process.

I profoundly disagree with your assertions about the correct way to handle malloc failure. While abort may be acceptable for some specific applications, general-purpose systems don't get to impose that opinion on programmers. Memory is a just another resource, and programs need to deal with resource exhaustion generally. Do you think programs should abort when the disk fills up?
> and mostly a consequence of eschewing exceptions.

A panic is the same thing as an exception. If you want to catch a panic, use recover(), it's meant to be used exactly for these end-of-the-world panic scenarios (and for FFI/etc).

You can plug in a custom allocator that panics on OOM (I think the standard one aborts).

As Steve mentioned, custom allocators can mean two things. The type that exists in rust today is one where you can make OOM panic, but not have allocation methods return Result. A planned extension will let you have allocators with different semantics entirely work with stdlib types (via defaulted type parameters); and this will let you use regular error handling with stdlib types too.

> A planned extension will let you have allocators with different semantics entirely work with stdlib types

And what about all the code that doesn't? It's because so much code exists that's completely oblivious to the possibility of these stdlib functions failing that I don't think that merely adding the option to do the right thing is good enough. The existing failure-oblivious APIs need to be explicitly deprecated.

The only ways to redeem Rust is to either support exceptions as first-class citizens with mandatory runtime support or to convert all existing allocating stdlib functions to return Result and mark all the existing failure-oblivious ones as being as deprecated as gets(3) in C.

> The existing failure-oblivious APIs need to be explicitly deprecated.

That's total overkill. For 99% of applications, process abort is fine, and dealing with it is just noise. Those 1% are usually things like kernels that use custom standard libraries anyway.

We're not doing the 99% a favor by making them think about OOM every time they do something that might allocate.

> The only ways to redeem Rust is to either support exceptions as first-class citizens with mandatory runtime support or to convert all existing allocating stdlib functions to return Result and mark all the existing failure-oblivious ones as being as deprecated as gets(3) in C.

This is silly hyperbole. Ask anyone who works in security whether the danger of xmalloc() is comparable to the danger of gets(). In fact, I've seen many security folks recommend only using xmalloc() with process abort instead of trying to explicitly handle OOM failures!

Even C++ doesn't have mandatory exception support, and even Rust can catch panics from failure-oblivious code.
> Even C++ doesn't have mandatory exception support

Yes it does. That some compilers provide a way to disable mandatory language features is no argument.

> even Rust can catch panics from failure-oblivious code.

Not while maintaining that code's invariants it can't.

> Yes it does. That some compilers provide a way to disable mandatory language features is no argument.

It's actually very relevant that huge amounts of C++ deployed in the world use -fno-exceptions, and many shops (for example, Google!) have a policy of "we do not use exceptions". I don't care about how well languages handle OOM in theory; what matters is how well they handle it in practice.

> for example, Google!

Google's C++ coding standards have done tremendous harm to the C++ community by perpetuating obsolete programming practices like two-phase initialization and lossy error reporting. Google's C++ standards also teach people that it's okay to use the STL and not worry about allocation failure, which hurts program robustness generally.

I'm not the only one who thinks so: see https://www.linkedin.com/pulse/20140503193653-3046051-why-go...

My C++ code is exceptional, modern, and robust, and anyone using -fno-exceptions can go fly a kite.

Are you saying C++ make it easy to write exception-safe code? Because Rust explicitly encodes exception safety into the type system with the RecoverySafe trait, you need to write unsafe code to bypass that, and the documentation on unsafe explicitly covers exception safety.
Rust doesn't consider exception safety to be a matter worth 'unsafe's time. All code must simply be memory-safe in the face of unwinding. RecoverySafe is basically "it's hard to leak busted state out of a region of code that panicked". That is, mutable references aren't RecoverySafe, and mutexes and the like poison their contents if they witness a panic while locked.

But RecoverySafe is just preventing things like "your binary heap was only partially heapified" and not "your heap is now full of uninitialized memory". You can get poisoned values out of mutexes just fine, so everything needs to put itself in a memory-safe state if a panic occurs.

One can bypass RecoverySafe in safe code with the AssertRecoverySafe wrapper.

It does however turn out that safe code in Rust is generally quite exception-safe by default. This is because safe code can't do anything too dangerous, panics are generally only caught at thread or application boundaries (so data that witnesses a panic is usually well-isolated) and there's way less places that can unwind compared to "override everything" C++. But exception safety is indeed something unsafe code needs to fight for (see the aforementioned binary heap in std).

Rust's type system doesn't attempt to guard against resource leaks.

  > No custom allocator can make the task gracefully report failure
  > instead of panicing.
So, first of all, "custom allocators" means two things:

  * overloading the allocator that's used by liballoc, and
    the crates that depend on it, like libstd
  * other allocators entirely
The first is described here: https://doc.rust-lang.org/book/custom-allocators.html

And the second is still in RFCs: https://github.com/rust-lang/rfcs/pull/1398

Both of these things are not yet stable. The second does, in fact, give you the ability to return an error code, by returning a Result.

However, on top of that, I don't see how

  >  mostly a consequence of eschewing exceptions.
and

  > No custom allocator can make the task gracefully report failure
  > instead of panicing.
Work together. Or rather, why is panic-ing bad, but an exception good?
> why is panic-ing bad, but an exception good?

Because the Rust people don't believe in making "catch" a first-class primitive in the language, and in fact, fully support a runtime option to turn all panics into aborts.

Even if abort-on-panic were to be killed as a legal mode of operation, and even if the stigma were to be removed from std::panic::recover, we'd still be left with a language with two error handling strategies and endless programmer confusion over which to use.

Rust's designers have done permanent damage to the language by not making exceptions the primary error reporting mechanism available to programmers, and it's not a mistake they can undo now.

> Because the Rust people don't believe in making "catch" a first-class primitive in the language, and in fact, fully support a runtime option to turn all panics into aborts.

recover() exists. You're right, there's a stigma to it, because you're not supposed to use it unless you really need to (hence, no programmer confusion). It's supposed to be used for situations like:

- Catching panics before crossing an FFI boundary

- End-of-the-world situations like OOM where you want to still handle it somehow

- Ensuring that applications can recover from internal panics in libraries (though there should be little to no panics in the libraries anyway)

The stigma for recover is for using it where you're not supposed to; as a substitute for regular error handling. In this situation, you are supposed to, so the stigma doesn't apply.

The fact that it's not a first-class primitive seems mostly irrelevant to me. Rust does a lot of things in library functions and types, even our concurrency safety mechanisms are something that can be duplicated in a library. As long as it can be used, what does it matter?

The fact that you can set the panic handler at runtime is also irrelevant. If you want to catch panics, don't do that.

The problem with the dualistic error handling strategy you're proposing is that the "severe" path gets even less testing than normal error recovery schemes do. Imagine you're working with a big non-exceptional C++ codebase (e.g., Firefox) and somebody throws std::bad_alloc. Even if you don't abort immediately and let the exception unwind the stack, the unwinding process will still leave lots of invariants broken, since all the cleanup paths are wired to return codes and will not run on unwinding.

The result is that your program can be almost arbitrarily broken after throwing. You might as well have just called longjmp.

It's because unwinding in only rare cases often produces bad results that I favor making unwinding the only error-reporting machinery in a language. If you use exceptions to report all errors, everyone starts caring about exception safety again.

Note that recover() uses Rust's type system to enforce certain things about exception safety. It's harder to mess up, even if libraries are written without unwinding in mind.
Exceptions can be turned into aborts in C++ as well, and are in many types of programs, because exceptions do have downsides for some problem domains. If Rust forced exceptions on everyone, there'd be people complaining about that just like you're complaining now.

I see the split between `Result` and `panic!` as more like Java's split between checked and runtime exceptions, except `Result` is much more usable than checked exceptions because it's part of the main data flow path, and so can use method chaining combinators instead of unwieldy try/catch blocks. OOM in Java is, like in Rust, not a checked exception, because it's not something you'd want to handle everywhere it can happen, but rather something to propagate up the stack transparently.

> Exceptions can be turned into aborts in C++ as well,

No you can't. -fno-exceptions does not appear in the C++ standard. You can write a compiler for any language. C++-that-aborts-on-throw is not C++, although, sure, it's closely related.

The ability to turn off C++ exceptions was a temporary workaround for compiler deficiencies in the 1990s that snowballed into an extremely harmful schism that's still doing tremendous damage to the C++ community.

The difference between -fno-exceptions and Rust's abort-on-panic is that the former is an unofficial, disgusting hack, while the latter is getting full official support for some reason.

That's not a very meaningful distinction to make- Rust doesn't even have a standard right now. Besides, -fno-exceptions is quite useful today, not just because of 90s compiler deficiencies, and is pretty well-supported by compilers.
The existence of -fno-exceptions means that library authors either using the language as intended, and accept losing a portion of their potential user base, or write less-than-optimally elegant and clear code, which punishes everyone, so a few can turn off a core feature of the language. It fragments the community.
This is an area where you just can't actually please everyone. I have heard the same opinions you've expressed in this thread, just as strongly, for even including unwinding at all. That aborts should be the only option, and that the cost of unwinding is far too high to be included in a true systems language.

Language design is tough. I'm glad we have multiple languages.

It's _because_ Rust tried to please everyone that it painted itself into this corner. If the exception people had won, life would have been great.

But if the error-code people had won, then life would still be good, because then Rust's stdlib might have been a bit uglier, but it would at least be correct with respect to error propagation. It's because Rust tries to satisfy both camps --- because it tries to give you the concision of exception code and, er, the lack of actual exceptions --- that it's forced into the terrible position of needing to abort internally on error, lacking a way to report errors to higher level code.

The lesson here is that optimizing for happiness and harmony leads to bad design.

I prefer "taking all use-cases seriously instead of abandoning a segment of users" to "happiness and harmony," as a characterization here. If serious use cases were not presented for both options, we would have enforced one. Or, if Rus weren't a systems language, we could have enforced one.

At the end of the day, if you have exceptions, you can still call abort in your exception handlers, so the split exists regardless. And without first-class support, those users are paying for a feature that they aren't using, which is against a core value of Rust.

You are arguing for replacing bad behavior "abort on OOM" with something even worse, exceptions. I honestly don't think you know what exceptions entail wrt what compilers do and the resulting bloat.
What, unwind tables? The ones that go untouched in normal operation? They're hardly catastrophic, and you need unwind support as a mandatory part of some ABIs in the first place. I know perfectly well what exceptions entail, and I maintain they're vastly better than other error handling strategies. You're the one who doesn't know what he's talking about.
There is an exception-like mechanism in Rust, in the form of the "try!" macro. It's a lot more flexible, but somewhat more verbose (Haskell has the same mechanism in a way that looks a lot more like exceptions, so that's not an inherent flaw). The best explanation I've seen is this:

http://www.jonathanturner.org/2015/11/learning-to-try-things...

tl;dr: "Result"s are like exceptions which are caught by default. You can (explicitly) propagate them upwards by using try!(...). This is nice because it means that you can tell what exceptions can occur in a block of code only using "local" information.

> There is an exception-like mechanism in Rust, in the form of the "try!" macro.

Correct. That's not the problem. If Rust's standard library returned Result in all cases where allocation could fail, I'd be satisfied. My primary issue is that they didn't, because Result is awkward.

Rust's designers went wrong in trying to have their cake and eat it too. They wanted to avoid exceptions and not make people care locally about errors. That's why they assert that errors just don't happen and abort if they do.

Throwing exceptions is a reasonable design choice. Returning error codes is a reasonable design choice. Pretending errors don't exist is not.

> Pretending errors don't exist is not.

We don't and we never have.

I don't think that there's any guarantee in Rust that malloc failure will abort rather than panic. That just happens to be the current implementation. I'm not sure I've ever heard of anyone running into that being an issue in practice, as opposed to this kind of abstract discussion. But I think that it wouldn't be considered a breaking change to switch from aborting to panicking if there were any kind of demand for it.

In Rust, exceptions (panic) are used for truly exceptional situations, like programmer error (indexing beyond the end of an array, division by zero) or things that practically are not expected to happen in a recoverable way in the course of ordinary use, like malloc failure. On modern virtual memory operating systems, malloc failure is so unlikely, and in application code there's so little you could reasonably do if it happened, that it is considered be a truly exceptional case.

On the other hand, Result is used for those kinds of errors that are expected to happen in practice even with working code on reasonable systems. IO errors, errors decoding UTF-8, etc.

Right now, catching exceptions (panics) using recover() is still considered unstable. There is some work ongoing to try and work out the API to help ensure safety, by marking types based on whether they are exception-safe or not; so you can use recover() with types that are built in an exception-safe way, or you can wrap types in AssertRecoverSafe to assert that you are providing exception-safety guarantees yourself, but you can't just arbitrarily recover from panics in code that has access to arbitrary data without someone having added an annotation somewhere that they believe that the code is exception-safe. https://github.com/rust-lang/rust/issues/27719 Note that based on the latest discussion, recover() will likely be named something else involving "unwind" to be more explicit about what it's doing.

And exception safety is quite important to the Rust authors. Note that Mutex has a built-in exception safety mechanism, poisoning the mutex on panic so that other users can't accidentally access the protected resource without being aware that another thread panicked while holding it.

Now, there are times when handling memory allocation failures properly is more important, such as in embedded systems or in operating system kernels, where you don't have a virtual memory abstraction with over-provisioning. However, in those cases you couldn't use the standard library anyhow, as the standard library depends on OS support; so you might as well use alternate data types that do return Result on allocating operations.

I'm just not sure about the utility of providing a convenient way to recover from malloc failure in applications running on virtual-memory operating systems. Can you show me an example in C++ (or any other language) where this is handled properly in application code in any way that doesn't simply log and abort, in which all unwinding code in the same application also avoids allocation as it may occur while unwinding from an allocation failure, and in which these code paths are actually tested in the test suite to ensure they behave properly?

> Right now, catching exceptions (panics) using recover() is still considered unstable. ... you can't just arbitrarily recover from panics in code that has access to arbitrary data without someone having added an annotation somewhere that they believe that the code is exception-safe

And it's for this reason that I don't think I'll be choosing Rust for any of my projects in the near future. This cavalier attitude toward memory exhaustion is not only concerning itself, but also makes me doubt the robustness and design principles of the rest of the system.

Besides, if you make exception-safe code difficult to write, nobody in practice will write it, so you'll end up with a system that's tantamount to one that just aborts. Saying that "Rust the language handled OOM just fine without stdlib!" and "we can convert OOM to panic!" is useless when these measures don't help real world code.

> In Rust, exceptions (panic) are used for truly exceptional situations

I've never accepted the argument that we need to use one error-recovery scheme for "normal" errors and another for "exceptional" ones. That kind of claim sounds reasonable, sober, and measured, but it leads to bad outcomes in every system I've seen, because the "exceptional" case in practice becomes a hard abort. A unified error handling scheme is a boon because it greatly simplified the cognitive analysis of errors.

Java is a good example of how to do right-ish. Serious errors are Throwables not derived from Exception, so normal catch blocks are unlikely to catch them. But serious errors are still exceptions (if not Exception), and all the usual language features for processing exceptions, including unwinding, stack trace recording, and chaining, operate normally.

Uniformity of error processing in Java is a great feature, and the language gets it without sacrificing the ability to distinguish between serious and expected errors. Now, I'm not arguing that Rust get checked exceptions, but I do have to insist that experience shows that you don't need two completely different error handling mechanisms (say, panic and Result) to mark problem severity.

> But I think that it wouldn't be considered a breaking change to switch from aborting to panicking if there were any kind of demand for it.

I'm not comfortable to casual changes in core runtime semantics.

> On modern virtual memory operating systems,

Are you just defining "modern" as "overcommit"? People (especially from the GNU/Linux world) constantly assert that allocation failure is rare, but I've seen allocations fail plenty of times, due to both address space exhaustion and global memory exhaustion. I don't have any firm numbers, but I haven't seen any from the abort-on-failure camp either.

> Can you show me an example in C++ (or any other language) where this is handled properly in application code in any way that doesn't simply log and abort, in which all unwinding code in the same application also avoids allocation as it may occur while unwinding from an allocation failure, and in which these code paths are actually tested in the test suite to ensure they behave properly?

SQLite [1] and NTFS [2] come to mind, as well as lots of tools I've discovered.

[1] https://www.sqlite.org/malloc.html

[2] guaranteed to make forward progress; pre-reserves all needed recovery resources; yes, I know NTFS runs in ring zero, but it's not the case that the kernel doesn't have to deal with dynamic memory allocation

There was a really interesting article on error handling in languages recently:

http://joeduffyblog.com/2016/02/07/the-error-model/

It makes the case that you do in fact want two different error handling mechanisms, because there are two quite different kinds of errors. The author argues that running out of memory is most practically treated as an unrecoverable error which aborts the process.