Hacker News new | ask | show | jobs
by microtonal 4078 days ago
Indeed, plus you need multiple hand-optimized versions. Not only per architecture, but also e.g. pre-AVX and AVX. An optimizing compiler will give you optimizations for all current and future platforms for free.

Another problem is that the number of people who can write good general hand-optimized assembly is small. E.g. I used a numeric Go library (which I will not name, because I should've submitted an issue) that used assembly-'optimized' routines. Replacing those with simple C loops and turning on auto-vectorization beat those hand-written routines handsomely.