| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by AdamH12113 1375 days ago
	> This version adds 4,489 characters, bringing the total to 149,186 characters. These additions include two new scripts, for a total of 161 scripts, along with 20 new emoji characters, and 4,193 CJK (Chinese, Japanese, and Korean) ideographs. That seems like a lot of new CJK characters! How did they end up with so many new characters after so long? Is there some gradual process of adding historical or extremely rare characters, or were some deliberately left out of earlier versions?

3 comments

lifthrasiir 1375 days ago

More like the former. There was indeed a deliberate omission in the past standard called Han unification [1], but it's now pretty much toned down thanks to the expansion of Unicode codepoint space in 2.0, following subsequent disunification processes and the eventual introduction of Ideographic Variation Database [2] to handle remaining cases.

[1] https://en.wikipedia.org/wiki/Han_unification

[2] https://unicode.org/ivd/

link

ks2048 1375 days ago

On the wikipedia page for "CJK Unified Ideographs Extension H", under "History"(https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_Extensi...), you can find dozens of linked documents describing why someone thought they should be added.

One random example I opened (https://www.unicode.org/L2/L2017/17099-haifeng-county-uax45....) is a 9 page PDF proposing a single character used for "congee shop signs in Haifeng County".

link

ksec 1375 days ago

I continue to think Han Unification, or Unicode for CJK is not the best solution to the problem.

link