Hacker News new | ask | show | jobs
by trinovantes 1542 days ago
I'm surprised obscure languages like Oji-Cree have unicodes

As more human languages go extinct, I wonder if people in the future will forget where some unicodes come from and if they will try to repurpose some codes

5 comments

it's one thing Unicode does well. I haven't come across a living natural language, that's not represented in Unicode, but some extinct languages are too. there's hieroglyphics (classic and demotic), Sumerian, etc. Mayan and Aztec logosyllabary don't seem to be officially allocated yet, but there's a proposed range for them.
We even have Unicode codepoints for alphabets like Shavian or Deseret which were never in widespread use but currently exist as linguistic curiosities.
I suppose that's the point, it's not just Unicode being quirky, imagine trying to publish a paper on language X and its unique, dead, script. It'd be harder now in the 21st century without it being in Unicode than it would've been in the early 20th (without Unicode existing at all). What would you do? Append an image and refer to the characters numerically?
Probably, preferably as a vector file, and then reference it with latex (or whatever you're typesetting your paper with) so it shows up as part of the rendered document. I.e. same way you include non Unicode items in your paper.
There's ~130 known unencoded scripts, about 70 historical and 60 modern (some of which are extinct, most of which just don't have that many users). Most of these have proposed ranges, but aren't actually in unicode yet. See https://linguistics.berkeley.edu/sei/index.html, which is the main group working to get the rest finished.
I'm trying to think of an example of a standard where the fixed sized index got full but re-use was a viable long term solution instead of either extension or complete replacement. It seems if that point gets hit the amount of effort to reuse isn't worth the short amount of time it gains. In IPv4 for example we reached the second level of NAT before anyone even tried to start using 0/8 which was set aside simply to not be used not even assigned and forgotten.
That's just for debugging Hammurabi's code.
They added a Linear A block despite the language still being undeciphered, so not having living speakers is not a barrier to being in Unicode.
¯\_(ツ)_/¯