Hacker News new | ask | show | jobs
by sippingjippers 2063 days ago
Is there any indication of the size of the lookup tables in question?
3 comments

Source code here: https://github.com/lemire/validateutf8-experiments/tree/mast...

Looks like a very tiny LUT, perhaps it's all in SIMD registers.

In any case, for anything to go at 12 GB/s, the LUT needs to be tiny and pretty much in SIMD register(s).

Any indirect memory accesses would drastically slow things down, even if it's in L1 cache [0]. At least by an order of magnitude.

Even when using something like x86 V[P]GATHER [0] (element-wise vector random access load). X86 gather is currently only slightly better than scalar code, even if all the data is on same cache line present in L1D!

[0]: https://www.felixcloutier.com/x86/vpgatherdd:vpgatherqd

48 bytes
To be specific, three lookup tables of 16 bytes each, each table having 4-bit keys.
From the paper: The first lookup table is 256 entries.

I assume there are more lookup tables as well, but I haven't read far enough to kn9ow what they all are.