| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Kilenaitor 927 days ago
	Why would Unicode bother consolidating code points like this...? Not like it's short on space.

3 comments

kijin 927 days ago

The consolidation effort would not have gone forward unless a significant number of people who have been studying these characters agreed that they were, in fact, the same characters.

Many Hanji/Kanji/Hanja characters have a long history of stylistic variation and simplification for the sake of aesthetics and convenience. For linguists and historians who are used to such variations, the direction of a minor stroke that doesn't alter the meaning would seem to be a purely stylistic choice, just as serifs on Latin characters don't alter the meaning. Some variations just happen to be more popular in some regions/countries/contexts and not others.

The general public, on the other hand, in each country is educated with the single "correct" variation favored by the government. Everything they read now uses the officially approved variation, so other variations look wrong. That stroke should absolutely not protrude to the other side, or you're a filthy barbarian!

The Unicode consortium seems to have listened more to the linguists and historians in this case. The academic stance doesn't always fit with public perception, which is often seasoned with a large pinch of nationalism.

link

lifthrasiir 927 days ago

> The consolidation effort would not have gone forward unless a significant number of people who have been studying these characters agreed that they were, in fact, the "same" characters.

The concrete term is the "normalization rules", which dictate how given arbitrary characters are transformed into domenstic variants. As far as I know most countries with significant Han character usages already had one before Unicode, and the ROK rules were (and still are being) developed alongside with Unicode.

link

j16sdiz 927 days ago

Except, in context of unicode, Han-Unification Rules are very different from Normalization Rules.

Some background can be found in https://www.unicode.org/versions/Unicode1.0.0/V2ch02.pdf

link

lifthrasiir 927 days ago

I think you have mistaken "normalization rules" in this context with Unicode normalization algorithms. I'm specifically talking about documents like [1], which are essentially more detailed and concrete versions of the original Han-unification rules.

(Note that normalization rules themselves are distinct from the eventual unification. It takes further works to actually decide whether the unification is possible or not.)

[1] https://appsrv.cse.cuhk.edu.hk/~irg/irg/irg53/IRGN2420_KRNor...

link

rsaxvc 927 days ago

It was, in the old days, 16-bit, IIRC.

link

Kilenaitor 927 days ago

Ah yes. Just realized the author linked to its wikipedia page. Whoops.

link

est 927 days ago

yeah alphabeticall languages solves this problem by assigning different code points with the same looking character in different languages, which doesn't bring any problem at all..... except maybe domain phising. LMAO

link