|
|
|
|
|
by pmh91
1112 days ago
|
|
That's a good question. For the pure-LZ77 Iguana variant (no rANS encoding), most of the decoding time is spent moving memory around rather than decoding the match+length tuples from the input stream, which suggests the performance difference wouldn't be that great if we were only on AVX2, but AVX-512 has a bunch of instructions that are super helpful for parsing our base254 integers quickly. If I had to take a wild guess I'd say it would cost an additional 15%. One sacrifice we made in the design is that the minimum match offset distance is always 32 bytes, which means we can always perform a literal or match copy by starting with a ymm register load + store. This hurts the compression ratio a bit but it helps performance immensely, and for that reason alone I suspect we'd still come out ahead of lz4 even without AVX-512. |
|
Have you done any testing with running multiple decompression tasks in parallel, or just running a single decompression task while at the same time running other tasks like maybe a web server?