Hacker News new | ask | show | jobs
by monocasa 1584 days ago
In practice the vast majority of MIPS code uses addu, the non trapping variant.

And in x86 land there's the into instruction, interrupt if overflow bit set, so you're left with the same options.

1 comments

Which has to be done after every instruction (http://boston.conman.org/2015/09/05.2) but it quite slow. Using a conditional jump after each instruction is faster than using INTO (http://boston.conman.org/2015/09/07.1).
My guess would be a pipelining issue where `INTO` isn't treated as a `Jcc`, but as an `INT` (mainly because it is an interrupt). Agner Fog's instruction tables[0] show (for the Pentium 4) `Jcc` takes one uOP with a throughput of 2-4. `INTO`, OTOH, when not taken uses four uOPs with a throughput of 18! Zen 3 is much better with a throughput of 2, but that's still worse than `JO raiseINTO`.

[0]: https://www.agner.org/optimize/instruction_tables.pdf

It's more complicated than shows up in micro benchmarks like that. Since when you do it, it's pretty much every add, you end up polluting your branch predictor by using jo instructions everywhere and it can lead to worse overall perf.