Hacker News new | ask | show | jobs
by 10000truths 1143 days ago
This has been mentioned before:

https://github.com/ziglang/zig/issues/7702

I don't think anyone disagrees about the need for intrinsics. In fact, I have actually taken a crack at implementing the AVX512 intrinsics into the Zig compiler as builtin functions on my personal fork of the repo. But it is a non-trivial task - there are over 450 distinct instructions across the entire AVX512 feature set, and over 100 for AVX2. And I'm only focusing on support for the LLVM backend, which does the heavy lifting in the codegen phase. Getting the register allocation and instruction scheduling correct for all the intrinsics in the self hosted backend would involve a lot more work.

2 comments

What I do for D is implement the intrinsics following the semantics of the x86 instructions. Target x86, x86_64, arm32, arm64 with D compilers, that smoothes out the difference. It's a lot of work, and very similar to the simd-everywhere library that does it for C++. There is not so much impendence mismatch between x86 and arm. I wish more people would understand that you absolutely need such intrinsics for fast software, there is no way around that. You're not going to write your 4x-at-once pow function for each arch, also you won't find a better name for `_mm_madd_epi16`. (EDIT: I guess nowadays you could do that but with taking ARM semantics as source of truth).

https://github.com/AuburnSounds/intel-intrinsics

Mostly agree, but there is actually a mismatch between madd_epi16 and Arm. Implementing Arm semantics or x86 on the other requires ~5 instructions, but if we generalize the definition to allow reordering (e.g. Highway's ReorderWidenMulAccumulate [1]), it's only 2 instructions.

1: https://github.com/google/highway/blob/master/g3doc/quick_re...

Indeed, and your comment led me to find additional issues with my port of _mm_madd_epi16.

I agree it would perhaps be possible to find better semantics for SIMD that kinda gloss over all the differences. That would be cleaner but require a lot of names. Well I suppose that's what Highway does, isn't it?

:) Yes indeed! Always happy to discuss suggestions for new intrinsics via Github issues.
I have not been monitoring the SIMD situation in Zig so it is nice to hear that there is some general support for intrinsics even if they are not yet added.

Thanks for your effort working on an implementation too. I am aware how large these instruction sets have gotten so I can at certainly imagine at least some of the effort of the undertaking.