Hacker News new | ask | show | jobs
by faragon 3397 days ago
UTF-16 has reserved codes as well, so it could be expanded for covering 2^32 codes, too.
1 comments

The range U+D800-DFFF is reserved for UTF-16 surrogates, specifically in two pairs of low and high surrogates. That means every surrogate pair can encode 10 + 10 bits of information, which is where the 16 astral planes (4 bits of 16-bit planes) comes in. Otherwise, there are just 128 code points in unallocated blocks in the BMP.

There is no space for expansion without reassigning private use areas or changing the encoding mechanism of surrogates--which is currently completely specified (each surrogate pair will produce a valid code point).