| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Rendello 1949 days ago
	It's generally used on bytes, but as others have said, it can be any symbol. Even on complete words! https://www.nayuki.io/page/huffman-coding-english-words

2 comments

user-the-name 1949 days ago

It's the opposite: It's almost never used on bytes, because that just doesn't give you a lot of compression.

It is generally used as the final stage of some other compression algorithm, and operates on symbols generated by that algorithm. Often, this is some variation on LZ77, and the symbols are something like "bytes 0-255" in addition to various symbols that denote a match in previous data of some length and at some offset.

link

nayuki 1948 days ago

Indeed, this is an example where each English word in a book gets a unique symbol for the purposes of Huffman coding. Note that the Huffman output is in base 52 (abc...xyzABC...XYZ) alphabet instead of the usual binary.

link