|
|
|
|
|
by mandevil
495 days ago
|
|
It's more that the God of Moore's Law have given us so many transistors that we are essentially always I/O blocked, so it effectively doesn't matter how good our assembly is for all but the most specialized of applications. Good assembly, bad assembly, whatever, the point is that your thread is almost always going to be blocked waiting for I/O (disk, network, human input) rather than something that a fancy optimization of the loop that enables better branch prediction can fix. |
|
this is again just more brash confidence without experience. you're wrong. this is a post about GPUs and so i'll tell you that as a GPU compiler engineer i spend my entire day (work day) staring/thinking about asm in order to affect register pressure and ilp and load/store efficiency etc.
> rather than something that a fancy optimization of the loop
a fancy loop optimization (pipelinig) can fix some problems (load/store efficiency) but create other problems (register pressure). the fundamental fact is NFL theorem applies here fully: you cannot optimize for all programs uniformly.
https://en.wikipedia.org/wiki/No_free_lunch_theorem