Hacker News new | ask | show | jobs
by nicwolff 3399 days ago
The goal is that by reading any byte you can tell if you are at the start of a character sequence, so we have to start each byte with some prefix – otherwise continuation bytes might sometimes look like start bytes. If we did as you suggest, we'd have to prefix continuation bytes with "111110", leaving only two bits of data in each!