Hacker News new | ask | show | jobs
by slezyr 2543 days ago
> IRC messages are always lines of characters terminated with a CR-LF (Carriage Return - Line Feed) pair, and these messages shall not exceed 512 characters in length, counting all characters including the trailing CR-LF. Thus, there are 510 characters maximum allowed for the command and its parameters.

Yeah, lets just forget how painful it was for non english speaking users to use IRC. 255 characters for unicode and 510, but you have to guess encoding.

3 comments

As a user of non-English languages, this is not really a problem in practice. We settled on UTF-8 years ago. My second language is pratically the worst case for bumping up against these limits and I never have an issue with them.
Really. So you don't think Japanese folks just don't run into problems? Or Koreans? Or anyone using a primarily upper Unicode alphabet that is phonetic?
Japanese and Chinese in particular can compress a lot more meaning in a byte than many other languages[1]. I pick up a random article at Nikkei.com[2] and calculate number of bytes of the first paragraph, and it's only 449 bytes in UTF-8[3]. Chinese is even more efficient at this, as you can basically fit the whole news in a Tweet.

[1]: Idiomatic Yojijukugo 四字熟語 is an extreme example for this, but there's non-idiom Yojijukugo too, e.g. 日米関係 is a 12 bytes word that translates to "United States-Japan relations"

[2]: https://www.nikkei.com/article/DGXMZO46571150V20C19A6000000/

[3]: It describes how people are walking around the park in Chicago on Jun 13 to catch a rare Pokemon with a one line interview of a son of Mr. Stuart from California.

(I speak three languages: Thai, English, Japanese)

This is fair, I should have thought more about the list.

For Japanese, I don't think the way people talk casually to one another is as amenable to compression as newspaper headlines

But surely for Thai you're in a sub-optimal boat?

This is completely anecdotal, but I have an alternative Twitter account where I interact with Japanese people I know, and I rarely hit the 140 characters limit except when I’m in a heated debate (or when I’m VERY excited about something).

For Thai, yeah, this one is a little more complicated. I’ve commented about this in sibling thread.

I tried to rig up a Shavian chat group and we just hit the wall over and over. It was frustrating so we moved to Matrix.
私の二番目の言語は日本語だよ
This makes it all the more baffling to me. You get 510 a line, but for a channel with a modest name in any other language, you get much less than that.

Let's just use a modest channel name like "#𐑥𐑨𐑔 𐑯 𐑕𐑲𐑧𐑯𐑕". I've now got a base 45 bytes without any message at all. If I want to aim a message at someone I have even less than that. Your pithy reply with a similarly modest title is 20% of the total allocation for a line, half of which is just overhead.

We run into line limits talk about category theory in #haskell even in English and folks are quite good at compressing contexts. The only alternative is to slice your messages across lines and make a confusing experience for participants.

I do operate a Korean IRC network and a message cut in the middle (often between UTF-8 boundaries, making clients guessing a wrong encoding from time to time) is a typical sightseeing.
I remember when this was a significant problem but with the rise of UTF-8 it is less so (even if that is strictly against the letter of the spec, it works fine in practice). Are there languages where the 510 bytes per message are a significant limitation?
I can only speak from experience, but Thai language uses 3 bytes per character in UTF-8 and rely on vowels and tone marks to compose a word, so the number of bytes can grow pretty quick. A headline of an article in Thai have a good chance of exceeding 510 bytes.

This is why one of the major Thai IRC networks were stuck with TIS-620 for a long time (ThaiNet/irc.thai.com, though I'm not sure if this is still the case) which is 8-bit compatible with ASCII (uses 0xA1 to 0xFB for Thai characters).

Isn't a major problem that the client cannot know how long the message can be because it's prefixed on the server side again (which counts into the 510 bytes).

But apart from that, I've certainly had my share of truncated messages, both in German and English. Sometimes you want to express a thought in full in several sentences without twenty other channel messages appearing in between the sentences while you're typing.

Cyrillic symbols are encoded in UTF-8 with two bytes. You limited with 255 characters.
Don't most IRC clients nowadays query the max line length on connect (the capabilities output) and then just split up one message across multiple lines if you cross that limit? I haven't had to think about line length since the 90s.