|
|
|
|
|
by daniel-cussen
4707 days ago
|
|
In the GA144, lookup tables are pretty painful, so the way I implement reverse there is: reverse: a! 16 push . 2 dup . . begin +x 2* 2* unext +x 2* a . + nip ; In Intel x86/64, the fastest way I know of is to use SIMD instructions, and break the 64-bit word into 16 nibbles (4-bit pieces), and use PSHUFB to perform a parallel lookup against another 128-bit xmm register. Then you aggregate the nibbles in reverse order, using inclusive or and variants of the shuffle instruction. |
|