Hacker News new | ask | show | jobs
by sirn 2541 days ago
I can only speak from experience, but Thai language uses 3 bytes per character in UTF-8 and rely on vowels and tone marks to compose a word, so the number of bytes can grow pretty quick. A headline of an article in Thai have a good chance of exceeding 510 bytes.

This is why one of the major Thai IRC networks were stuck with TIS-620 for a long time (ThaiNet/irc.thai.com, though I'm not sure if this is still the case) which is 8-bit compatible with ASCII (uses 0xA1 to 0xFB for Thai characters).