Hacker News new | ask | show | jobs
by LarryMade2 2493 days ago
There can be a significant trade-off on size vs speed, the more tricks you do to shave down bytes usually adds to the complexity of the iterations.

So assembly programmers may go for the more kludgy looking code as the execution far outpaces the optimized byte count version. Ive heard of such things in video timing and game loops.

2 comments

This really depends on the specific architecture and the application. In some cases, you will want to optimize mostly for size, so that your hotspots fit entirely into I-cache. Modern CPUs spend most of their time waiting for data (or instructions) to become available, so often computations are essentially free.
the average IPC over a variety of loads is, IIRC, estimated to be ~1. So no, most modern cpu do not spend most of their time waiting for data.
Screen blitting is a frequent unroll for speed case, often combined with self modifying code to make compiled sprites.