Y
Hacker News
new
|
ask
|
show
|
jobs
by
akavi
5401 days ago
Is an arbitrary bit sequence valid unicode? If so, I'm curious if a simple per-character Huffman encoding of English would be more efficient.
1 comments
JeremyBanks
5401 days ago
No, it's not. (Although "unicode" is ambiguous, because there are several unicode encodings.) Even if the binary sequence can be decoded as unicode code points, certain combinations of code points aren't allowed (surrogate pairs have to be matched).
link