| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by trinovantes 1589 days ago
	I'm surprised obscure languages like Oji-Cree have unicodes As more human languages go extinct, I wonder if people in the future will forget where some unicodes come from and if they will try to repurpose some codes

5 comments

sterlind 1589 days ago

it's one thing Unicode does well. I haven't come across a living natural language, that's not represented in Unicode, but some extinct languages are too. there's hieroglyphics (classic and demotic), Sumerian, etc. Mayan and Aztec logosyllabary don't seem to be officially allocated yet, but there's a proposed range for them.

link

mFixman 1589 days ago

We even have Unicode codepoints for alphabets like Shavian or Deseret which were never in widespread use but currently exist as linguistic curiosities.

link

OJFord 1589 days ago

I suppose that's the point, it's not just Unicode being quirky, imagine trying to publish a paper on language X and its unique, dead, script. It'd be harder now in the 21st century without it being in Unicode than it would've been in the early 20th (without Unicode existing at all). What would you do? Append an image and refer to the characters numerically?

link

zamadatix 1589 days ago

Probably, preferably as a vector file, and then reference it with latex (or whatever you're typesetting your paper with) so it shows up as part of the rendered document. I.e. same way you include non Unicode items in your paper.

link

NathanielLovin 1589 days ago

There's ~130 known unencoded scripts, about 70 historical and 60 modern (some of which are extinct, most of which just don't have that many users). Most of these have proposed ranges, but aren't actually in unicode yet. See https://linguistics.berkeley.edu/sei/index.html, which is the main group working to get the rest finished.

link

zamadatix 1589 days ago

I'm trying to think of an example of a standard where the fixed sized index got full but re-use was a viable long term solution instead of either extension or complete replacement. It seems if that point gets hit the amount of effort to reuse isn't worth the short amount of time it gains. In IPv4 for example we reached the second level of NAT before anyone even tried to start using 0/8 which was set aside simply to not be used not even assigned and forgotten.

link

kettro 1589 days ago

https://en.wikipedia.org/wiki/Canadian_Aboriginal_syllabics

We have unicode for Sumerian, so I don't think that will ever happen.

https://en.wikipedia.org/wiki/Sumerian_language

link

mbg721 1589 days ago

That's just for debugging Hammurabi's code.

link

kej 1589 days ago

They added a Linear A block despite the language still being undeciphered, so not having living speakers is not a barrier to being in Unicode.

link

coverup 1589 days ago

¯\_(ツ)_/¯

link