Hacker News new | ask | show | jobs
by masklinn 1580 days ago
> Not sure if the Rust implementation has a different design to make it perform better though.

Prolly not, I expect the issue comes from the increase in branches since a value-based error reporting has to branch on every function return. Even if the branch is predictible, it’s not free.

And fib() would be a worst-case scenario as it does very little per-call, the constant per-call overhead would be rather major.

2 comments

It's also worth noting that Rust does also have stack-unwinding error propagation, in the form of `panic`/`catch_unwind`, which can be used as a less-ergonomic optimization in situations like this. Result types like this also don't color the function, since you can just explicitly panic, which would be inlined at the call site and show similar performance to C++ exceptions.
This is very much non-idiomatic. Panics are not intended as “application” error reporting, but rather as “programming” error.

The intended use-case of catch_unwind is to protect the wider program e.g. avoid breaking a threadpool worker or a scheduler on panic, or transmit the information cross threads or to collation tools like sentry.

Using up scarce branch-prediction slots is a good way to make your program unoptimizable. Time wasted because you ran out will not show up anywhere localized on your profile. (Likewise blowing any other cache.)
Using up BTB slots is an interesting problem but in practice doesn't seem to be a big issue. If it was, ISAs would use things like hinted branches but instead they've been taking them away. Code size is more important but hot/cold splitting can help there.

A problem with using exceptions instead is they defeat the return address prediction by unwinding the stack.

That hinted branches are not useful tells us nothing about the importance of branch predictor footprint. When hinting branches gives you a bigger L1 cache footprint, it has a high cost. Compilers nowadays use code motion to implement branch hinting, which does not burn L1 cache. (Maybe code motion is what you mean by "hot/cold splitting"?)

Anyway the hint we really need, no ISA has: "do not predict this branch". (We approximate that with constructs that generate a "cmov" instruction, which anyway is not going away.)

How does using exceptions defeat return address prediction? You are explicitly not returning, so any prediction would be wrong anyway. In the common case, you do return, and the predictor works fine.

> When hinting branches gives you a bigger L1 cache footprint, it has a high cost.

It was the same size on PPC, and on x86 using recommended branch directions (but not prefixes).

> Compilers nowadays use code motion to implement branch hinting, which does not burn L1 cache. (Maybe code motion is what you mean by "hot/cold splitting"?)

Hot/cold splitting is not just sinking unlikely basic blocks, it's when you move them to the end of the program entirely.

That doesn't hint branches anymore, though; Intel hasn't recommended any particular branch layout since 2006.

> How does using exceptions defeat return address prediction? You are explicitly not returning, so any prediction would be wrong anyway.

Anything that never returns is a mispredict there; most things return. What it does instead (read the DWARF tables, find the catch block, indirect jump) is harder to predict too since it has a lot of dependent memory reads.

It suffices, for cache footprint, for cold code to be on a different cache line, maybe 64 bytes away. For virtual memory footprint, being on another page suffices, ~4k away. Nothing benefits from being at the "end of the program".

Machines do still charge an extra cycle for branches taken vs. not, so it matters whether you expect to take it.

Negligibly few things never return; most of those abort. Performance of those absolutely does not matter.

Why should anyone care about predicting the catch block a throw will land in after all the right destructor calls have finished? We have already established that throwing costs multiple L3 cache misses, if not actual page faults.

> Nothing benefits from being at the "end of the program".

It's not about the benefit, that's just the easiest way to implement it - put it in a different TEXT section and let the linker move it.

Although, there is a popular desktop ARM CPU with 16KB pages.

> Machines do still charge an extra cycle for branches taken vs. not, so it matters whether you expect to take it.

Current generation CPUs can issue one taken branch or 2 not-taken branches in ~1 cycle (although strangely Zen2 couldn't), but yes it is better to be not taken iff not mispredicted. (https://www.agner.org/optimize/instruction_tables.pdf)

> Negligibly few things never return; most of those abort. Performance of those absolutely does not matter.

Throwing an exception isn't a return, nor longjmp/green threads/whatever. Sometimes they're called abnormal or non-local returns, but according to your C++ compiler your throwing function can be `noreturn`.

Error path performance is important since there are situations like network I/O where errors aren't at all unexpected. If you're writing a program you can just special case your hotter error paths, but if you're designing the language/OS/CPU under it then you have to make harder decisions.

> Why should anyone care about predicting the catch block a throw will land in after all the right destructor calls have finished? We have already established that throwing costs multiple L3 cache misses, if not actual page faults.

More prediction is always better. The earlier you can issue a cache miss the earlier you get it back.

For instance, that popular desktop ARM CPU can issue 600+ instructions at once (according to Anandtech). That's a lot of unnecessary stalls if you mispredict.

And so its vendor has their own language, presumably compatible with it, which doesn't support exceptions.