Hacker News new | ask | show | jobs
by adrian_b 385 days ago
While you are right that in general it may be very difficult or quasi-impossible to generate optimal programs, there are also plenty of cases where an optimal program is achievable.

The latter happens when there is one dominant bottleneck for the algorithm, which is determined by the hardware, e.g. the maximum throughput of a certain instruction, like multiplication or memory load. When the implemented program reaches a throughput almost equal with that absolute limit, then one can be certain of its optimality.

1 comments

Yeah, in that case I guess it is usually memory-bound? When it is memory-bound you don't really care that much about scheduling etc.
Matrix multiplies are typically compute bound, but you don't get much option to improve the actual algorithm because Nvidia gives you an accelerator for one and anything else would be slower.