|
|
|
|
|
by ncmncm
1580 days ago
|
|
It suffices, for cache footprint, for cold code to be on a different cache line, maybe 64 bytes away. For virtual memory footprint, being on another page suffices, ~4k away. Nothing benefits from being at the "end of the program". Machines do still charge an extra cycle for branches taken vs. not, so it matters whether you expect to take it. Negligibly few things never return; most of those abort. Performance of those absolutely does not matter. Why should anyone care about predicting the catch block a throw will land in after all the right destructor calls have finished? We have already established that throwing costs multiple L3 cache misses, if not actual page faults. |
|
It's not about the benefit, that's just the easiest way to implement it - put it in a different TEXT section and let the linker move it.
Although, there is a popular desktop ARM CPU with 16KB pages.
> Machines do still charge an extra cycle for branches taken vs. not, so it matters whether you expect to take it.
Current generation CPUs can issue one taken branch or 2 not-taken branches in ~1 cycle (although strangely Zen2 couldn't), but yes it is better to be not taken iff not mispredicted. (https://www.agner.org/optimize/instruction_tables.pdf)
> Negligibly few things never return; most of those abort. Performance of those absolutely does not matter.
Throwing an exception isn't a return, nor longjmp/green threads/whatever. Sometimes they're called abnormal or non-local returns, but according to your C++ compiler your throwing function can be `noreturn`.
Error path performance is important since there are situations like network I/O where errors aren't at all unexpected. If you're writing a program you can just special case your hotter error paths, but if you're designing the language/OS/CPU under it then you have to make harder decisions.
> Why should anyone care about predicting the catch block a throw will land in after all the right destructor calls have finished? We have already established that throwing costs multiple L3 cache misses, if not actual page faults.
More prediction is always better. The earlier you can issue a cache miss the earlier you get it back.
For instance, that popular desktop ARM CPU can issue 600+ instructions at once (according to Anandtech). That's a lot of unnecessary stalls if you mispredict.
And so its vendor has their own language, presumably compatible with it, which doesn't support exceptions.