Hacker News new | ask | show | jobs
by CoastalCoder 38 days ago
> 100% of the optimum, which is anyway unattainable.

Can you expand on this? Sounds like an interesting discussion.

2 comments

:) I figure there is always something left to improve. For some kernels which really want to keep 30+ live registers, the compiler might not do as good a job as careful manual tuning, so intrinsics can have a bit of a cost. But I also figure optimization time is limited, so better to get 90% of several kernels rather than one to 99%.
Not who you asked but I think the meaning is that since intrinsics for simd are different in each platform, being able to have something that is portable and sometimes works faster is something, while writing for Intel, ARM and a zoo of instruction sets is not an option for some.