|
|
|
|
|
by floofy222
2671 days ago
|
|
I see that you are using MMX intrinsics directly, like _mm_sub_pi8, but you are never calling _mm_empty (https://software.intel.com/sites/landingpage/IntrinsicsGuide...) as required by the SysV AMD64 ABI (and pretty much all other ABIs out there). I think the behavior of all the code that touches is undefined (it breaks the calling convention of the ABI), and while this often results in corrupted floating point values in registers, maybe you won't see much if you are not using the FPU. Still, since the function is inline, chances that this gets inlined somewhere where it could cause trouble seem high. You might want to look into that. Also, I wish this would all be written in Rust, there is great portable SIMD support over there. Might make your life easier trying to target other platforms. EDIT: as burntsushi mentions below, that's not available in stable Rust, but if you want to squeeze out the last once of performance out of the Rust compiler, chances are you won't be using that anyways. |
|
If, once you review our codebase and verify that we are not inadvertently using a 22-year-old SIMD extension but still have undefined behavior, please write an issue on github.
I'm admiring Rust from a distance at this stage. I am comfortable enough with writing bare intrinsics and slapping a giant #ifdef around stuff.