Hacker News new | ask | show | jobs
by mehta 4105 days ago
> [...] CJK unification[...] has never been a point of contention in the communities concerned with it.

I am not very familar with the CJK unification project so take my points with a grain of salt.

> more than not opposing CJK unification, I benefit from it greatly.

I think think that is a different point of view. Isn't it? You are seeing your benefit whereas the author is seeing his. Here's an alternative solution: What if the search engine understood what you were searching for and returned results in all the languages? Unification can result in a lot of information loss the same way a photo can be compressed but it comes at the cost of loss in quality.

> so there is(to my eyes at least) no value in fragmenting instances of the same character.

But no-one is fragmenting instances of the same character. They _are_ different characters from different languages. To take an example from the article, I am not sure how I feel about combining B and β. You are either ignoring the whole of English speaker population or the greek speaking one. Given that you have complete flexibility to assign a code for both of them, why not do it(responsibly)?

2 comments

No experience in Asian languages, but J is pronounced differently in English and German. Should they have unique characters?
The example characters expressed: 日、中、力 however are written the same way in both Chinese and Japanese from my understanding. (Albeit, I studied Japanese).

There are admittedly variations which should be done separately, however unification of visually identical glyphs is a "good thing" imho

I'm fluent in Japanese and speak some Mandarin Chinese as well. These 3 characters are identical, not similar.

For a different example, 国 and 國 used to be the same character, but China and Japan (left) have both diverged the traditional form still used in Taiwan (right). Unicode treats them as separate.

今 Looks slightly different in traditional Chinese vs other languages. In traditional Chinese, the little straight line between the two angled lines is sloped, while it is horizontal in simplified Chinese, Japanese or Korean. Any reader of any of these languages would have no issue if the variant they are used to was replaced by the other one. They might think you have a sloppy handwriting or an ugly font if they even notice, but that's about it. Unicode treats them as the same.

This seems again to be a perfect place for rendering rather than encoding. The english letter 'a' can be rendered as a ring with a tail (the way I handwrite), or a ring with a cap and a tail (the way the font usually renders). Both are the same letter, if rendered differently based on my (contextually sensitive) font.
I think not. I do want to be able to say both People's Republic of China 中华人民共和国 and Republic of China 中華民國 in the same text, and if I had to choose rendering either 国+华 or 國+華 then it wouldn't work.
Curious, as I'm not sure when that would actually happen in real life (in Chinese). Generally in mainland China , the ROC would always be rendered with 国, even officially [1]. And in Taiwan the PRC would be rendered with 國 [2].

It gets a bit weirder in Japanese where the word is distinctly not the same - one is a traditional version (proper noun) of the other and you could imagine a text using both (William vs Wilhelm vs Will).

[1] http://baike.baidu.com/view/2200.htm [2] https://www.google.com.tw/?gws_rd=ssl#q=%E4%B8%AD%E5%8D%8E%E...

That is true, mainland China writes "中华民国".

But I still do want to be able to write texts that are like this discussion: mainly in English, but contain fragments in Chinese, and so that I can use both the traditional and simplified characters.

And it also makes total sense to me that 日本 is Japan, both in Japanese and Chinese, using the exact same Unicode characters.

Except you can still recognize the 'a' as 'a' no matter which way it is rendered.

Not so with Chinese characters. For instance, the character for "fly" in simplified (飞) and traditional (飛) look very different. Someone who only learned simplified may not recognize the traditional character as being the same.

Which is exactly why 飞 and 飛 are encoded separately. I don't see any problem with that.
Yes, but other characters that also look different are merged. Here's an example: http://www.tofugu.com/2012/04/04/the-sorry-state-of-japanese...

That's the character for "cold". If you showed me (a Chinese speaker) the Japanese or Korean variant, I would have no idea what it meant.