Hacker News new | ask | show | jobs
by WildUtah 4377 days ago
A technical note: Twitter carries unicode messages; the limit is not 140 octets but 140 code points [0]. This is especially useful when tweeting in Japanese.

I believe there are more than 65536 code points assigned so a tweet should be able to carry over 2240 bits of information -- 2240+ Yos.

---

Incidentally, in Japanese and BEV (probably adopted from jp), the word "yo" has an entirely different meaning.

[0] http://www.joelonsoftware.com/articles/Unicode.html

Edit: added endnote

2 comments

A good one, thanks. Yes, that makes good sense. I guess that in some languages twitter really does carry a lot more information than it does in Latin alphabet languages.

If your language would only allow for 140 bytes you'd be really out of luck if your language routinely requires multi-byte sequences in UTF-8.

Hm. You could do a 'tweet compression' trick where you use a fixed number of bit from a subset of UTF-8 that you know is multi-byte (selected for the extra long sequences) in order to put longer messages on twitter, and then use a decompressor to turn it back into ascii.

"YoTor" could push innocuous "Yo" messages out over disposable one-time use IPv6 addresses, and all of the side-band message data could be transmitted in the IPv6 address. "The source address is the message."