Hacker News new | ask | show | jobs
by noselasd 4641 days ago
UTF-8 encodes unicode code points, so its unicode or some external entity that converts between character sets that have to deal with those issues, not UTF-8

UTF-8 would pretty much only need to be updated if the unicode standard redefines what a code point is (e.g. starts using floating point, decimals, imaginary numbers or something else that is also unlikely to happen)

1 comments

> UTF-8 would pretty much only need to be updated if the unicode standard redefines what a code point is (e.g. starts using floating point, decimals, imaginary numbers or something else that is also unlikely to happen)

Or if they decide that they need more codepoints, so some invalid-but-possible UTF-8 byte sequences suddenly become valid.

There's no reason for that, UTF-8 is only there to encode Unicode codepoints, and the whole range of codepoints (including the 80% not yet attributed) can be expressed in UTF-8.