Hacker News new | ask | show | jobs
by abelsson 4635 days ago
As an intermediate step, before starting with the SSE intrinsics I rewrote the code in a form that should have been reasonably suitable for an autovectorizer (an inner loop over a fixed number of elements - I imagine it probably looked fairly similar to your code), but my gcc with -ftree-vectorize didn't do much with it. I didn't really explore that path further though.

I actually did a version which did the reduction over minimum purely using SIMD and then a post step which reduced the SIMD minimums to a single scalar. It was somewhat tricky to get the index right, and in the end it turned out to not be faster (at least not for the little example of 32 objects, I imagine you would gain something on a more complex scene)

Anyway, it was a fun little exercise and it has sparked some interesting discussion. Thanks for posting the original.