I work in DB kernels, everything gets hand tuned as there is economic reason for hand tuning it. The expectation in many of these systems is that there are no wasted cycles. You can codegen a decent kernel, but then someone will find a better approach - do you want the slower version of the product, or the faster one?
You can see this in action with matrix math libraries, folks have been hand tuning those for decades at this point.
Your claim is that there are automated methods (which I mentioned in my original post) to manage the complexity. My claim is that it requires a large team of engineers working on it. I'm not really sure what you think you've refuted.
You can see this in action with matrix math libraries, folks have been hand tuning those for decades at this point.