|
|
|
|
|
by Const-me
1702 days ago
|
|
I wonder how does it compare to BMI2 + AVX2 version: https://godbolt.org/z/xcT3exenr The code doesn’t look great because no inlining. In reality, if you have many bytes in the buffer and calling these functions in a loop, compilers should inline the function, loading all these magic vectors outside the loop. This way it won’t be any memory loads, except to fetch the source data. |
|