| UTF-8 is just... so well designed. One little feature I like in particular is that if you're looking for an ASCII-7 character in a UTF-8 stream -- say, a LF or comma -- you don't have to decode the stream first because all bytes in the encoding of non-ASCII-7 characters have the high bit set. Or as Wikipedia puts it: > Since ASCII bytes do not occur when encoding non-ASCII code points into UTF-8, UTF-8 is safe to use within most programming and document languages that interpret certain ASCII characters in a special way, such as / (slash) in filenames, \ (backslash) in escape sequences, and % in printf. It's amazing to hear they put it together in one night at a diner! :-D |
I guess you're saying that in good humor. But I'll add this because it makes me appreciate how these things happen:
> What happened was this. We had used the original UTF from ISO 10646 to make Plan 9 support 16-bit characters, but we hated it.
"We hated it" -- there is just so much going on in those 3 words. They could have been suffering with the previous state for a year for all we know. And even if not, to know you hate something just takes a lot of system building experience to get to. And then when opportunity struck they probably already had a laundry list of grievances they had built up over that time and were ready to pounce.