Hacker News new | ask | show | jobs
by KingLancelot 1255 days ago
UTF-8 is 1-4 bytes per codepoint dude, not 1-2.

280 * 4 = 1120, not 560.

1 comments

Twitter doesn't allow 280 of any codepoint. The more complicated ones count double against your limit.
Ah, so Twitter’s Unicode implementation is fucked.
Nothing is wrong with it.

If it counted UTF-16 code units that would be dumb. It doesn't. The cutoff was deliberately set to keep the 140 character limit for CJK but increase it to 280 for the rest. And they did that based on observational data.

https://cdn.cms-twdigitalassets.com/content/dam/blog-twitter...

https://blog.twitter.com/en_us/topics/product/2017/Giving-yo...