Hacker News new | ask | show | jobs
by chrisseaton 2285 days ago
> So how far off is Unicode from being 'done'?

Do you think human written language is ‘done’ and will never evolve?

1 comments

I think that most of the changes in Unicode 13 are not from the evolution of human written language. I don't know anyone who's ever written "blueberries" by drawing a picture of some blueberries in the middle of their text.
From the perspective of a person who uses an alphabetical language, such as English, sure Unicode can be "done". But if your language is based on ideograms, like Chinese, then it'll never be "done". As words are created they need to be encoded.
Again, that's great and I understand that (I've studied Japanese), but that's only part of the new version. They're not adding pictures of "mousetrap" and "olives" and "toilet plunger" because any existing language needs to write these.

Furthermore, I'm really starting to question the way CJK is encoded. We don't make every English word a separate codepoint. 97% of these CJK ideographs are just different combinations of the same few radicals. Korean seems especially weird, as they have both individual radicals and every precomposed triple (in a block that's been rearranged once or twice, on the basis that nobody was really using it yet). I'm not saying we should nix all precomposed Hanzi/Kanji, exactly, because that's a very convenient way for programs to handle text, but it seems like this system is becoming increasingly awkward for non-western languages.

I feel there's a fundamental flaw when our "universal" text encoding system can't handle the regular creation of new words in a well-understood way, for languages spoken by 1/3rd of the world's population. It's like we're issuing hardware patches for a software problem.

It is, I do not like the way CJK is being doubt with. Not to mention fonts dont include All the CJK variants of the fonts when I use the same word but need a JK variant because that is how it was suppose to be used.

Even the "C" has traditional and simplified variant.

Fortunately I think Unicode is pretty much done for Alphabetical languages. Someday if CJK design Unicode isn't good enough breaking it off to something better isn't entire impossible.

Emojis literally are an evolution in human written language. They started with youth texting and are now showing up in business emails. I predict that within 50 years we'll see emojis as a routine component of New York Times articles.
Now you're getting into the definition of "writing". I would say I've only seen emoji typed, not written. (Before anyone asks: yes, I've seen cuneiform written. I have some interesting friends.) If you count any visual communication that is typed on a phone under the greater umbrella of "writing", then we could also include colors, styles, orientation, funny fonts, image memes, animation, etc. There's no end to the possible visual communication that people might want to transmit digitally.

Where do you draw the line? I draw it at "anything in or using a language that people might write in the absence of computers, which they would then reasonably want to store and transmit using a computer". I don't include "any possible visual communication that can occur using a computer". That's far too broad to define "text", or be part of any existing "language", which are the stated goals of Unicode.

I'm not going to hold my breath on this one. Emojis have an air of informality that is not appropriate in many circumstances. Imagine writing a death notice with emojis
Well one could use U+1303F.

But you wait until Maya script gets into Unicode. You'll have at least three different Maya codepoints for death. (-: