|
|
|
|
|
by eigenspace
2081 days ago
|
|
@avx does a ton more than just use AVX instructions. It'll reorder and unroll loops when advantageous, swap out some functions for more vectorizable version of those functions and a few other tricks. Julia uses avx instructions by default if your code is amenable to it. |
|
What I did't see was how I could use the AVX2 instructions myself.
Checking now, since Julia's count_ones() maps to the LLVM popcount instruction, and recent clang versions know how to optimize that fixed-length sequence in C even for AVX-512, the Julia equivalent to the code I wrote should have good performance.
There are a few optimizations (keeping one AVX register loaded with a constant byte string, and using prefetch instructions) which might be missing. I'll be talking with the conference participant who brought up Julia to work this out in more detail.
Thanks for the comment!