|
|
|
|
|
by eesmith
3084 days ago
|
|
Shannon's original theory does not result in an optimal encoding. Page 2059 of the linked-to text (the third display page) says: > But, is that the best we can do? Shannon’s Theorem 3 does not address that question, since it only suggests a suboptimal code. (The optimal code of rate R
simply disregards all but the 2^(NR) most probable sequences of length N.) Shannon finds the answer in Theorem 4: as long as we require probability of error strictly less than 1, asymptotically, we cannot encode at rates below the entropy. This statement is commonly known as the strong converse source coding theorem. The converse (or weak converse) source coding theorem asserts that error probability cannot vanish if the compression rate is below the entropy. Later on the same page: > The variable-length source code that minimizes average length was obtained by D. Huffman So what you want is to learn about how Huffman coding works. There are many such examples - it's common in data structures courses and text books. One example is in the classic "Structure and Interpretation of Computer Programs" at https://mitpress.mit.edu/sicp/full-text/sicp/book/node41.htm... . Look around and you'll find plenty more. There are also many code examples at at https://rosettacode.org/wiki/Huffman_coding , and worked-out examples at StackOverflow, like https://stackoverflow.com/questions/11587044/how-can-i-creat... . The Wikipedia page at https://en.wikipedia.org/wiki/Huffman_coding seems unhelpful for a novice. |
|
The patents have now expired, which is why the technology starts to appear in open source and royalty free implementations like Zstd.
[1] https://en.wikipedia.org/wiki/Range_encoding [2] https://en.wikipedia.org/wiki/Arithmetic_coding [3] https://facebook.github.io/zstd/