Hacker News new | ask | show | jobs
by daniel-cussen 4707 days ago
In the GA144, lookup tables are pretty painful, so the way I implement reverse there is:

reverse: a! 16 push . 2 dup . . begin +x 2* 2* unext +x 2* a . + nip ;

In Intel x86/64, the fastest way I know of is to use SIMD instructions, and break the 64-bit word into 16 nibbles (4-bit pieces), and use PSHUFB to perform a parallel lookup against another 128-bit xmm register. Then you aggregate the nibbles in reverse order, using inclusive or and variants of the shuffle instruction.

1 comments

This does an 18 bit word, right?
Yep. I thought this would be a huge issue when using this, but first, it's really necessary for the instruction set, and second, a lot of hardware uses 18-bit, including FPGA's (often packed w/ 18x18 multipliers and 18bit SRAMs, in order to support 8b/10b SERDES) and 72-bit DDR3.