|
|
|
|
|
by dragontamer
990 days ago
|
|
Your 8-bit lookup can never be parellized, while the add/cmp/cmov is really easy AVX512 that probably auto-inlines and auto-vectorizes. I dunno, it's ridiculously a benefit to the code by my instinct. While lookup table looks pretty bad. |
|
I mean, I'm not skilled enough in those ISA extensions to stick my neck out. It's not totally obvious to me that there is not some shuffle or permute facility that can load 64 bytes at a time from LUTs.