Hacker News new | ask | show | jobs
by MrRadar 2724 days ago
> This is an interesting line of argument - in what way, and what could be done to improve the ergonomics?

Due to the enormous complexity of modern CPUs I'm not sure there's anything that could be done. With the 486 and contemporary (and earlier) uarches you could largely expect the CPU to execute exactly what you wrote so understanding the performance impact of any given bit of assembly was pretty straightforward. Then CPUs started adding features like superscalar, speculative, and out-of-order execution, branch prediction, deep pipelines, register renaming, and multi-level caching that massively complicate modeling the performance of any given code.

For example you may need to explicitly clear an architectural register before reusing it for a new calculation to avoid creating a false dependency in the uarch which would prevent the CPU from executing the calculations in parallel. Knowing when this is necessary can be hard and the rules are usually different between different uarches, even within the same uarch family.

Good assembly programmers who are aware of all this complexity can still beat compilers but they certainly can't do that for the scale of code that compilers routinely generate. Thankfully compilers are generally "good enough" these days and assembly only needs to be hand-written for very hot inner loops for performance-critical code or for cryptographic code where the exact performance characteristics of the code could potentially leak information if they're not handled correctly.