Hacker News new | ask | show | jobs
by Kilenaitor 927 days ago
Why would Unicode bother consolidating code points like this...? Not like it's short on space.
3 comments

The consolidation effort would not have gone forward unless a significant number of people who have been studying these characters agreed that they were, in fact, the same characters.

Many Hanji/Kanji/Hanja characters have a long history of stylistic variation and simplification for the sake of aesthetics and convenience. For linguists and historians who are used to such variations, the direction of a minor stroke that doesn't alter the meaning would seem to be a purely stylistic choice, just as serifs on Latin characters don't alter the meaning. Some variations just happen to be more popular in some regions/countries/contexts and not others.

The general public, on the other hand, in each country is educated with the single "correct" variation favored by the government. Everything they read now uses the officially approved variation, so other variations look wrong. That stroke should absolutely not protrude to the other side, or you're a filthy barbarian!

The Unicode consortium seems to have listened more to the linguists and historians in this case. The academic stance doesn't always fit with public perception, which is often seasoned with a large pinch of nationalism.

> The consolidation effort would not have gone forward unless a significant number of people who have been studying these characters agreed that they were, in fact, the "same" characters.

The concrete term is the "normalization rules", which dictate how given arbitrary characters are transformed into domenstic variants. As far as I know most countries with significant Han character usages already had one before Unicode, and the ROK rules were (and still are being) developed alongside with Unicode.

Except, in context of unicode, Han-Unification Rules are very different from Normalization Rules.

Some background can be found in https://www.unicode.org/versions/Unicode1.0.0/V2ch02.pdf

I think you have mistaken "normalization rules" in this context with Unicode normalization algorithms. I'm specifically talking about documents like [1], which are essentially more detailed and concrete versions of the original Han-unification rules.

(Note that normalization rules themselves are distinct from the eventual unification. It takes further works to actually decide whether the unification is possible or not.)

[1] https://appsrv.cse.cuhk.edu.hk/~irg/irg/irg53/IRGN2420_KRNor...

It was, in the old days, 16-bit, IIRC.
Ah yes. Just realized the author linked to its wikipedia page. Whoops.
yeah alphabeticall languages solves this problem by assigning different code points with the same looking character in different languages, which doesn't bring any problem at all..... except maybe domain phising. LMAO