Hacker News new | ask | show | jobs
by Athas 2544 days ago
I remember when this was a significant problem but with the rise of UTF-8 it is less so (even if that is strictly against the letter of the spec, it works fine in practice). Are there languages where the 510 bytes per message are a significant limitation?
3 comments

I can only speak from experience, but Thai language uses 3 bytes per character in UTF-8 and rely on vowels and tone marks to compose a word, so the number of bytes can grow pretty quick. A headline of an article in Thai have a good chance of exceeding 510 bytes.

This is why one of the major Thai IRC networks were stuck with TIS-620 for a long time (ThaiNet/irc.thai.com, though I'm not sure if this is still the case) which is 8-bit compatible with ASCII (uses 0xA1 to 0xFB for Thai characters).

Isn't a major problem that the client cannot know how long the message can be because it's prefixed on the server side again (which counts into the 510 bytes).

But apart from that, I've certainly had my share of truncated messages, both in German and English. Sometimes you want to express a thought in full in several sentences without twenty other channel messages appearing in between the sentences while you're typing.

Cyrillic symbols are encoded in UTF-8 with two bytes. You limited with 255 characters.