Hacker News new | ask | show | jobs
by jandrese 2252 days ago
> It is a pain in the ass to have a variable number of bytes per char.

Maybe, but nobody can stomach the wasted space you get with UTF-32 in almost every situation. The encoding time tradeoff was considered less objectionable than making most of your text twice or four times larger.

1 comments

And as the article points out, even then you might have more than one code point for a character.

> For example, the only way to represent the abstract character ю́ cyrillic small letter yu with acute is by the sequence U+044E cyrillic small letter yu followed by U+0301 combining acute accent.