Hacker News new | ask | show | jobs
by exDM69 218 days ago
> Getting maximum performance out of SIMD requires rolling your own code with intrinsics

Not disagreeing with this statement in general, but with std::simd I can get 80% of the performance with 20% of the effort compared to intrinsics.

For the last 20%, there's a zero cost fallback to intrinsics when you need it.

1 comments

To clarify, there are many things SIMD is used for that look nothing like the loop parallelism or doing numerics commonly discussed. For example, heterogeneous concurrency is likely going to be beyond compilers for the foreseeable future and it is a great SIMD optimization.

A common example is executing the equivalent of a runtime SQL WHERE clause on arbitrary data structures of mixed types. Clever idioms allow surprisingly complex unrelated constraint operators to be evaluated in parallel with SIMD. It would be cool if a compiler could take a large pile of fussy, branchy scalar code that evaluates ad hoc constraints on data structures and converts it to an equivalent SIMD constraint engine but that doesn't seem likely anytime soon. So we roll them by hand.