| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nauful 2302 days ago

Do you mean following the hash chain to the end of the window?

If you want an exhaustive LZ matchfinder, a trie, binary tree such as LZMA's bt4 or suffix array are more efficient algorithms, especially with window sizes larger than 32KB, and there are other better matchfinders past 64MB such as Rabin Karp for longer matches.

Zip is inefficient as a format on modern out of order CPUs. Modern replacements such as zstd have blocks of literals, lengths and decisions to decode an entropy coded table at once rather than conditionally branch.

Also, beating zip compression is quite easy: use a larger window size and a relatively quick (small number of tested slots) hash chain (or multiple e.g. 3, 4 and 7 bytes) can find better matches.

If decompression time and speed is not an issue, and you aren't limited to LZ based approaches, a bytewise model such as ppm or bitwise such as paq will provide much better compression without too much thinking.

Edit: forgot to mention, the parsing strategy is also important. Finding an approximately optimal lowest cost path through a block can often save 10%+ ratio with LZ + Huffman.