Hacker News new | ask | show | jobs
by losvedir 1898 days ago
Yeah, this is great! I came across that recently when working on a parser in Zig, which treats strings as arrays of bytes. I didn't know much about UTF8 other than that it's scary and programmers mess up text processing all the time. I was worried that a multi byte code point could trick my simple char switch which was looking for certain ASCII characters. But then I came across that bit you quoted and was but surprised and relieved!

Then, when I needed to minimally handle non-ASCII characters I found Zig's minimal unicode helper library and saw what I was looking for in a small function that takes a leading byte and returns how many bytes there are in the codepoint. I was impressed with the spec again!