|
|
|
|
|
by stu2010
1386 days ago
|
|
My Rust is not strong, but in the AVX-512 solution, I don't fully understand how they're incrementing the input by a whole AVX-512 word (16xu32) by only doing input = input.offset(1); ? I'd assume that will increment their input array by only 1 single u32. With the approach used here, it also looks like you'll write some garbage into output after output_end, which isn't a problem unless output was sized exactly equally to expected output and you attempted to write past the end of it via _mm512_mask_compressstoreu_epi32 . |
|
E. g. filter_vec_avx2 doesn't declare the loop variable i and stores input elements into the output instead of their indices. Or from_u32x8 has a DataType instead of __m256i and [u32; __m256i] instead of [u32; 8].