| HN Mirror

Yeah, I understood what you meant, I've used wrappers like that before. My contention was with your original comment,

>It's possible to target everything from SSE and NEON to AVX512 with what is essentially a single code path.

the practice of which does not generally make the best usage of any particular instruction set, emulating certain operations that aren't available on a platform with multiple instructions, etc. It might be good enough for many light optimization jobs, in which case I'd say go for it, you're doing so much better than the vast majority of programmers writing Python or whatever. But what I was trying to argue was that if you really need to crunch the hell out of some numbers, then you probably have a small set of target platforms that you can justify directly using intrinsics (or even assembly) for.

This claim, however:

>I'm talking about e.g. AVX vs. AVX2 for floating-point code not SSE2 vs. AVX.

is a lot more reasonable, but you could do the same with some strategically placed #ifdefs with native intrinsics or assembly.