Hacker News new | ask | show | jobs
by lynguist 872 days ago
1) You can introduce the innovation of spaces, as is used for example in older Japanese video games that had no Kanji

2) If that were a problem oral Japanese would be unintelligible. It is not. Most homonyms can be resolved.

Furthermore, they could adopt a Korean system, where everyone uses a non-Kanji system, but the Kanji are still taught in high school as a type of “Latin”/“Greek” where they draw literally 55% of their vocabulary from.

2 comments

Regarding your point (2), it's not that simple, since written / literary language uses a larger vocabulary, making homonyms more of a problem.

In fact if you search a Japanese dictionary in hiragana with a combination of two reasonably common kanji readings (say かん+ちょう), you'll often get a double digit number of results (12 in jmdict in this example).

Most of these are uncommon words unlikely to be used in the spoken language, but could occur in writing.

It could probably still work more or less by relying on context, but it's more of an issue than you make it sound.

Another point is that writing in hiragana with spaces would disconnect the language from its roots. The Kanji used add a layer of meaning to the language that isn't there in languages with phonetic alphabets, and help guessing at the meanings of unknown words.

You could probably argue that making it easier to learn to read and write would outweigh the loss of those benefits, but I'm not so sure. Japan has a high literacy rate, so it does seem to work alright.

I've thought about this a little bit and wonder if the result CJK readers have ended up with is the result of a few interesting historic quirks.

Assuming that homophones are simply going to exist in the languages, there's probably other alternatives. For example, in the case of modern Korean readers, the Hanja can supposedly used to help clarify the meaning of a homophone where context doesn't provide clarity. But in practice, most Korean readers don't use Hanja enough to remember more than a small percentage of what they learn in school, so they work in effect as an index into a lookup table, where they look the Hanja up in a dictionary, find the definition written in Hangul, and use that to determine the meaning.

Support Japanese was reformed to entirely use one or both Kana in the way that modern Koreans use mostly Hangul. Then Kanji would be used to help determine meaning in the same way. As most people wouldn't encounter Kanji with enough frequency and distribution to remember the bulk of them, only the most common would be remembered. These are unlikely to be the difficult to discern homophones as contextual clues would clarify. So the Kanji would again also end up as indexes for looking up standardized definitions. In modern Japanese, this doesn't happen because Kanji is still used with enough frequency and variety that most people sustain some level of memory about the system. The question is how well do they remember it? [1][2][3][4][5]

1 - https://youtu.be/sJNxPRBvRQg 2 - https://youtu.be/IARguDQIGVs 3 - https://youtu.be/cpAnrVYMJho 4 - https://youtu.be/PhtOewdIxII 5 - https://youtu.be/-E6vHCT0wpw

So for non-Sino CJK languages, instead of spending so much time in education learning what amounts to a very complex lookup table indexing scheme, why not just standardize the definitions and use the numbers of the definitions as post-fixes to ambiguous homophones with Chinese origins?

So, to English speakers, homophones might look like unfortunate side effects of using an inferior symbol sets or something, I don't think that is how languages end up with a lot of homophones - rather, I think it's the result of a lossy compression; there are just such set of same tokens and circuits of thoughts that are addressed from different context for different effects.

And I think your lookup table thinking is halfway there to a deep understanding of the matter, it does work like a forward lookup from ideograms to meanings and pronunciation cues. sure you can sub:4 idea:5 wit:1 phne:2 Alph, and homo:7 dis:32 with num:2 idx for std:6 def:3, but I wonder if us humans are really good with that data model.

I’m (slowly and painfully) learning to read Thai and find myself wondering how a written language could naturally evolve without spaces between words. It adds a significant overhead since it requires learning the word boundary rules.
Spaces between words are a relatively recent addition to scripts like the Greek or Latin alphabets, roughly 1500 years ago. Early Greek was also written boustrophedon (“like an ox plows a field”) that is, left to right, then the next line is right to left, and so on. Sometimes a point (dot) was used to break words.

Spacing is hardly standardized in languages using Latin script; French typography, especially in older books is notably different from English or German, with spacing between sentences or certain punctuation being different. Then again phrases or terms which in English or French would be multiple words are written as single “words” in German („Straßenkehrgerät“ == “Street Sweeper”). And I find Russian spacing rules disturbing.

And look at Arabic which does have spacing but in calligraphy can grossly violate the bounds of what you might consider “running text” coming from a European background.

The boustrophedon has one more rule that you omitted and it’s extremely natural, – when you write from right to left you flip all the letters! My little daughter intuitively writes in this system even though no one taught her that nor did she see it somewhere. She explains that that way you know how to read longer passages that need to wrap over.