Hacker News new | ask | show | jobs
by fbkr 1580 days ago
I've been using `expected`, i.e. value-or-error type, for a while in C++ and it works just fine, but the article shows it has some noticeable overhead for the `fib` workload for instance. Not sure if the Rust implementation has a different design to make it perform better though.
1 comments

> Not sure if the Rust implementation has a different design to make it perform better though.

Prolly not, I expect the issue comes from the increase in branches since a value-based error reporting has to branch on every function return. Even if the branch is predictible, it’s not free.

And fib() would be a worst-case scenario as it does very little per-call, the constant per-call overhead would be rather major.

It's also worth noting that Rust does also have stack-unwinding error propagation, in the form of `panic`/`catch_unwind`, which can be used as a less-ergonomic optimization in situations like this. Result types like this also don't color the function, since you can just explicitly panic, which would be inlined at the call site and show similar performance to C++ exceptions.
This is very much non-idiomatic. Panics are not intended as “application” error reporting, but rather as “programming” error.

The intended use-case of catch_unwind is to protect the wider program e.g. avoid breaking a threadpool worker or a scheduler on panic, or transmit the information cross threads or to collation tools like sentry.

Using up scarce branch-prediction slots is a good way to make your program unoptimizable. Time wasted because you ran out will not show up anywhere localized on your profile. (Likewise blowing any other cache.)
Using up BTB slots is an interesting problem but in practice doesn't seem to be a big issue. If it was, ISAs would use things like hinted branches but instead they've been taking them away. Code size is more important but hot/cold splitting can help there.

A problem with using exceptions instead is they defeat the return address prediction by unwinding the stack.

That hinted branches are not useful tells us nothing about the importance of branch predictor footprint. When hinting branches gives you a bigger L1 cache footprint, it has a high cost. Compilers nowadays use code motion to implement branch hinting, which does not burn L1 cache. (Maybe code motion is what you mean by "hot/cold splitting"?)

Anyway the hint we really need, no ISA has: "do not predict this branch". (We approximate that with constructs that generate a "cmov" instruction, which anyway is not going away.)

How does using exceptions defeat return address prediction? You are explicitly not returning, so any prediction would be wrong anyway. In the common case, you do return, and the predictor works fine.

> When hinting branches gives you a bigger L1 cache footprint, it has a high cost.

It was the same size on PPC, and on x86 using recommended branch directions (but not prefixes).

> Compilers nowadays use code motion to implement branch hinting, which does not burn L1 cache. (Maybe code motion is what you mean by "hot/cold splitting"?)

Hot/cold splitting is not just sinking unlikely basic blocks, it's when you move them to the end of the program entirely.

That doesn't hint branches anymore, though; Intel hasn't recommended any particular branch layout since 2006.

> How does using exceptions defeat return address prediction? You are explicitly not returning, so any prediction would be wrong anyway.

Anything that never returns is a mispredict there; most things return. What it does instead (read the DWARF tables, find the catch block, indirect jump) is harder to predict too since it has a lot of dependent memory reads.

It suffices, for cache footprint, for cold code to be on a different cache line, maybe 64 bytes away. For virtual memory footprint, being on another page suffices, ~4k away. Nothing benefits from being at the "end of the program".

Machines do still charge an extra cycle for branches taken vs. not, so it matters whether you expect to take it.

Negligibly few things never return; most of those abort. Performance of those absolutely does not matter.

Why should anyone care about predicting the catch block a throw will land in after all the right destructor calls have finished? We have already established that throwing costs multiple L3 cache misses, if not actual page faults.