|
|
|
|
|
by vorg
4040 days ago
|
|
> But we don't seem to be running out The issue isn't the quantity of unassigned codepoints, it's how many private use ones are available, only 137,000 of them. Publicly available private use schemes such as ConScript are fast filling up this space, mainly by encoding block characters in the same way Unicode encodes Korean Hangul, i.e. by using a formula over a small set of base components to generate all the block characters. My own surrogate scheme, UTF-88, implemented in Go at https://github.com/gavingroovygrover/utf88 , expands the number of UTF-8 codepoints to 2 billion as originally specified by using the top 75% of the private use codepoints as 2nd tier surrogates. This scheme can easily be fitted on top of UTF-16 instead. I've taken the liberty in this scheme of making 16 planes (0x10 to 0x1F) available as private use; the rest are unassigned. I created this scheme to help in using a formulaic method to generate a commonly used subset of the CJK characters, perhaps in the codepoints which would be 6 bytes under UTF-8. It would be more difficult than the Hangul scheme because CJK characters are built recursively. If successful, I'd look at pitching the UTF-88 surrogation scheme for UTF-16 and having UTF-8 and UTF-32 officially extended to 2 billion characters. |
|