Hacker News new | ask | show | jobs
by pkaye 3399 days ago
"10" is used as a prefix for the bytes after the first. This gives it the self-synchronization property if it somehow ends up in the middle of a sequence. See the first table in this Wikipedia link: https://en.wikipedia.org/wiki/UTF-8
1 comments

Additionally, 110xxxxx tells you that the character is two bytes, 1110xxxx three bytes, and 11110xxx four bytes, i.e., number of bytes in number = leading 1 count.