Hacker News new | ask | show | jobs
by bane 872 days ago
I've thought about this a little bit and wonder if the result CJK readers have ended up with is the result of a few interesting historic quirks.

Assuming that homophones are simply going to exist in the languages, there's probably other alternatives. For example, in the case of modern Korean readers, the Hanja can supposedly used to help clarify the meaning of a homophone where context doesn't provide clarity. But in practice, most Korean readers don't use Hanja enough to remember more than a small percentage of what they learn in school, so they work in effect as an index into a lookup table, where they look the Hanja up in a dictionary, find the definition written in Hangul, and use that to determine the meaning.

Support Japanese was reformed to entirely use one or both Kana in the way that modern Koreans use mostly Hangul. Then Kanji would be used to help determine meaning in the same way. As most people wouldn't encounter Kanji with enough frequency and distribution to remember the bulk of them, only the most common would be remembered. These are unlikely to be the difficult to discern homophones as contextual clues would clarify. So the Kanji would again also end up as indexes for looking up standardized definitions. In modern Japanese, this doesn't happen because Kanji is still used with enough frequency and variety that most people sustain some level of memory about the system. The question is how well do they remember it? [1][2][3][4][5]

1 - https://youtu.be/sJNxPRBvRQg 2 - https://youtu.be/IARguDQIGVs 3 - https://youtu.be/cpAnrVYMJho 4 - https://youtu.be/PhtOewdIxII 5 - https://youtu.be/-E6vHCT0wpw

So for non-Sino CJK languages, instead of spending so much time in education learning what amounts to a very complex lookup table indexing scheme, why not just standardize the definitions and use the numbers of the definitions as post-fixes to ambiguous homophones with Chinese origins?

1 comments

So, to English speakers, homophones might look like unfortunate side effects of using an inferior symbol sets or something, I don't think that is how languages end up with a lot of homophones - rather, I think it's the result of a lossy compression; there are just such set of same tokens and circuits of thoughts that are addressed from different context for different effects.

And I think your lookup table thinking is halfway there to a deep understanding of the matter, it does work like a forward lookup from ideograms to meanings and pronunciation cues. sure you can sub:4 idea:5 wit:1 phne:2 Alph, and homo:7 dis:32 with num:2 idx for std:6 def:3, but I wonder if us humans are really good with that data model.