|
|
|
|
|
by adgjlsfhk1
1117 days ago
|
|
this is very cool. thank you for releasing it! if you had to guess, how much of the performance depends on avx512 specifically? if this could run reasonably well on avx2, IMO this would be a really great general successor to LZ4 |
|
For the pure-LZ77 Iguana variant (no rANS encoding), most of the decoding time is spent moving memory around rather than decoding the match+length tuples from the input stream, which suggests the performance difference wouldn't be that great if we were only on AVX2, but AVX-512 has a bunch of instructions that are super helpful for parsing our base254 integers quickly. If I had to take a wild guess I'd say it would cost an additional 15%.
One sacrifice we made in the design is that the minimum match offset distance is always 32 bytes, which means we can always perform a literal or match copy by starting with a ymm register load + store. This hurts the compression ratio a bit but it helps performance immensely, and for that reason alone I suspect we'd still come out ahead of lz4 even without AVX-512.