Hacker News new | ask | show | jobs
by bmalicoat 5401 days ago
Good article, but isn't the author describing Huffman codes?
1 comments

Same basic idea. He's using variable-length input strings and mapping them to unique symbols rather than the other way around, but it seems like he's using entropy measurements to build an optimal trie.
Is an arbitrary bit sequence valid unicode? If so, I'm curious if a simple per-character Huffman encoding of English would be more efficient.
No, it's not. (Although "unicode" is ambiguous, because there are several unicode encodings.) Even if the binary sequence can be decoded as unicode code points, certain combinations of code points aren't allowed (surrogate pairs have to be matched).