Hacker News new | ask | show | jobs
by bonsaibilly 1214 days ago
That'd be my guess, but I don't really know. They just left the "utf8" type as broken 3-byte gibbled UTF-8, and added the "utf8mb4" type and "utf8mb4_unicode_ci" collation for "no, actually, I want UTF-8 for real".
1 comments

It will be a fun day when Unicode crosses the 5-byte UTF-8 encoding threshold :/
It won't. We settled on using stateful combining characters instead. (Remember when the selling point of switching the world to Unicode was "represent all writing systems with a single stateless 16 bit encoding"? Yeah, well, lol.)
Anything beyond four bytes is composed of multiple code points, happily