|
|
|
|
|
by nemequ1729
2146 days ago
|
|
(Lead developer of SIMDe here.) It sounds like you actually know what you're doing, so in this case you're probably right, at least if all you do is compile your x86 code with SIMDe. That said, SIMDe also provides support for other architectures, notably WASM SIMD 128 and AltiVec/VSX, as well as portable implementations which work everywhere, including on CPUs I'd never heard of until people told me SIMDe was working well on them (I'm thinking of Kalray, which supports vectors but doesn't have an API and instead relies on compiler auto-vectorization support). One use case for SIMDe which may be interesting for you is that you can freely mix calls to different APIs. Say, for example, that you already have a bunch of x86 code written and want a NEON port. You can add SIMDe and you get a NEON port basically for free, then you can start adding some ifdefs to add optimizations for NEON without having to rewrite the whole thing. SIMDe doesn't in any way prevent you from optimizing your NEON (or whatever) port. The way I tend to look at it is that SIMDe never makes your code slower, only more portable. |
|
Also, about this
> we have an extensive test suite to verify our implementations
Don’t forget about MXCSR register in that suite, esp. the rounding bits of that. I avoid changing it as much as possible ‘coz the state is preserved across context switches and causes funny things in OpenMP and other thread pools, but not all people are aware of that. Also, there’s non-trivial amount of code written for SSE < 4.1 (the 4.1 introduced proper rounding instruction, roundps) where you sometimes forced to mess with MXCSR rounding bits because the alternatives are much slower.