Hacker News new | ask | show | jobs
by akavi 5401 days ago
Is an arbitrary bit sequence valid unicode? If so, I'm curious if a simple per-character Huffman encoding of English would be more efficient.
1 comments

No, it's not. (Although "unicode" is ambiguous, because there are several unicode encodings.) Even if the binary sequence can be decoded as unicode code points, certain combinations of code points aren't allowed (surrogate pairs have to be matched).