| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nknighthb 4561 days ago

Not in the general case. It's been a long time since x86 assembly developers could commonly beat a decent optimizing compiler.

The thing about compilers is that they're leveraging, even if imperfectly, the collective wisdom of their authors and of the companies who actually built the chips and have offered insight, advice, and sometimes even code. It's very probable they know more performance tricks than you do.

One problem is landmines in the ISA, such as instructions that look like they exist to be used, but are really traps implemented in suboptimal microcode for the unwary programmer who didn't look closely at their performance characteristics. Or certain sequences of instructions that might combine to do something ridiculously slow[1].

These landmines vary by microarchitecture. An instruction that's incredibly slow on one line of x86 chips might be a wonder-drug on another. This both increases the probability that your code will hit a landmine on at least some CPUs, and gives you a possible "in": Compilers aren't going to optimize perfectly for every microarchitecture. If you know exactly what you're doing (or spend a hell of a lot of time on trial and error), you might be able to come up with optimal codepaths for specific chips that the compiler didn't.

By and large it's not worth it, though. Hand-tuned assembly still ends up in places, but increasingly rarely, and it's confined to small hot-spots. A particular algorithm or part of an algorithm gets re-implemented in assembly because the compiler just can't get it right.

[1] I could have sworn there was a story about this just recently, but I can't seem to find it. Something like a piece of code running way slower than anyone thought it should, until an AMD engineer piped up and said "Oh yeah, don't do that, it causes a pipeline flush." for reasons that were utterly non-obvious to anyone who didn't know the internals of the chip.