|
|
|
|
|
by johnt15
3733 days ago
|
|
By saying single code path, I don't mean single instruction stream. libsimdpp, for example, supports building same code for different instruction sets, linking into the same executable and then dispatching dynamically. Doing this by hand would mean that either: - lots of time is wasted creating slightly different versions of code. I'm talking about e.g. AVX vs. AVX2 for floating-point code not SSE2 vs. AVX. - micro-optimization opportunities are wasted by only coding for major revisions of the instruction set Even when optimal performance may only be achieved via completely different approaches, the SIMD wrappers are easier to use, because they present consistent interface. Any specialized instructions may be used by simply falling back to native intrinsics. Thus I don't see much benefit of writing SIMD code without a wrapper. The only advantage is that it's harder to shoot oneself into the foot with naive use of these wrappers, e.g. if one doesn't actually look into the generated assembly code. |
|
>It's possible to target everything from SSE and NEON to AVX512 with what is essentially a single code path.
the practice of which does not generally make the best usage of any particular instruction set, emulating certain operations that aren't available on a platform with multiple instructions, etc. It might be good enough for many light optimization jobs, in which case I'd say go for it, you're doing so much better than the vast majority of programmers writing Python or whatever. But what I was trying to argue was that if you really need to crunch the hell out of some numbers, then you probably have a small set of target platforms that you can justify directly using intrinsics (or even assembly) for.
This claim, however:
>I'm talking about e.g. AVX vs. AVX2 for floating-point code not SSE2 vs. AVX.
is a lot more reasonable, but you could do the same with some strategically placed #ifdefs with native intrinsics or assembly.