|
|
|
|
|
by saagarjha
1900 days ago
|
|
It’s fairly simple, actually: leading bytes have a specific bit pattern that continuation bytes don’t. A single-byte character will have the topmost bit unset (0b0xxxxxx), and for a multi-byte run the first byte will have the top two bits set (0b11xxxxxx) and any succeeding bytes will have the top bit set but the next bit unset (0b10xxxxxx). This means given an arbitrary byte you can always tell what it is, and you can tell when you’re at the start of a next character by looking for those first two bit patterns. |
|