| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by neon5077 866 days ago
	Look up Huffman codes on Wikipedia. In short, you put the most likely or most common values at the front of the dictionary. Values 0-255 take up one byte, 256-65535 take two bytes, etc. The lower your index, the fewer bits are needed to represent the corresponding state.

2 comments

vitus 866 days ago

You're missing an important detail of Huffman codes: they need to be prefix-free so you can actually tell when one value ends and another begins.

A simplified example: suppose you have four letters in your alphabet that you want to represent in binary. You can't just represent these as A=0, B=1, C=10, and D=01, since it's impossible to tell whether "01011" is meant to represent ABABB or DDA or ACBB.

(Hamming codes on the other hand are intended for error correction, where you want maximally distant code words to minimize the chance that bitflips cause decoding errors.)

link

neon5077 866 days ago

I don't really think that's too relevant to the core concept of using fewer bits for the most common codes.

That's really a problem inherent to binary streams in general, not just Huffman encoding.

link

vitus 866 days ago

It was a minor point re: this claim:

> Values 0-255 take up one byte, 256-65535 take two bytes

If you wanted to encode more than 256 values, then at best you'd be able to specify values 0-254 taking one byte, e.g. if you used 0xff as a special prefix to represent "use another byte for all other values", but that particular encoding means that you'd only be able to encode another 256 values with a second byte.

link

angry_albatross 866 days ago

I think you mean Huffman codes, not Hamming codes.

link

neon5077 866 days ago

Yup.

link