| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by userbinator 684 days ago

Besides a persistent off-by-one error, and the use of actual trees instead of table lookup for canonical Huffman, this is a pretty good summary of the LZ+Huffman process in general; and in the 80s through the mid 90s, this combination was widely used for data compression, before the much more resource-intensive adaptive arithmetic schemes started becoming popular. It's worth noting that the specifics of DEFLATE were designed by Phil Katz, of PKZIP fame. Meanwhile, a competing format, the Japanese LZH, was chosen by several large BIOS companies for compressing logos and runtime code.

Note that real-world GZIP decoders (such as the GNU GZIP program) skip this step and opt to create a much more efficient lookup table structure. However, representing the Huffman tree literally as shown in listing 10 makes the subsequent decoding code much easier to understand.

Is it? I found the classic tree-based approach to become much clearer and simpler when expressed as a table lookup --- along with the realisation that the canonical Huffman codes are nothing more than binary numbers.

4 comments

082349872349872 683 days ago

I'd agree with TFA that canonical Huffman, although interesting, would be yet another thing to explain, and better left out of scope, but it does raise a question:

In what other areas (there must be many) do we use trees in principle but sequences in practice?

(eg code: we think of it as a tree, yet we store source as a string and run executables which —at least when statically linked— are also stored as strings)

link

mxmlnkn 682 days ago

> In what other areas (there must be many) do we use trees in principle but sequences in practice?

Heapsort comes to mind first.

link

commandlinefan 683 days ago

Author here! I actually did a follow-up where I looked at the table-based decoding: https://commandlinefanatic.com/cgi-bin/showarticle.cgi?artic...

link

yyyk 683 days ago

>before the much more resource-intensive adaptive arithmetic schemes started becoming popular

The biggest problem was software-patent stuff nobody wanted to risk before they expired.

link

lifthrasiir 684 days ago

The lookup table here would be indexed by a fixed number of lookahead bits, so it for example would duplicate shorter codes and put longer codes into side tables. So a tree structure, either represented as an array or a pointer-chasing structure would be much simpler.

link