|
|
|
|
|
by losvedir
1898 days ago
|
|
Yeah, this is great! I came across that recently when working on a parser in Zig, which treats strings as arrays of bytes. I didn't know much about UTF8 other than that it's scary and programmers mess up text processing all the time. I was worried that a multi byte code point could trick my simple char switch which was looking for certain ASCII characters. But then I came across that bit you quoted and was but surprised and relieved! Then, when I needed to minimally handle non-ASCII characters I found Zig's minimal unicode helper library and saw what I was looking for in a small function that takes a leading byte and returns how many bytes there are in the codepoint. I was impressed with the spec again! |
|