| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by deathanatos 4397 days ago

> they are 36 characters which is 36-bytes in ASCII, but 108(!) bytes in 3-byte unicode.

I'm going to argue that that's not the norm: "3-byte unicode" is kinda WTF, since it doesn't really exist. If you're in UTF-8, it's a 36 byte comparison.

That sounds, however, a lot like MySQL, which has perhaps one of the more braindead "UTF-8" implementations. That said, I'm not sure that it uses 3 bytes for code points that don't require it. (At least, that it spaces them that way: there may be nulls past the data, sure, but those won't count in a comparison.)