Hacker News new | ask | show | jobs
by rett12 3593 days ago
Native speakers seem to do fine. Learning a language while growing up, having the Hiragana as a helper, while all your media is written in Japanese makes everything easier. When they finish school they know enough Japanese to go by. It's obviously different for non-native people.

Also, it's not like you stop learning even after school. For example English has according to the Oxford dictionary 171,476 words in current use excluding inflections, and several technical and regional vocabularies. Does all English university students know these words?

2 comments

Logographic systems have some major disadvantages:

• It's possible to know how to say a word, but have no clue how to write it. This phenomenon is called character amnesia, and it affects most native speakers.[1] Phonetic languages allow you to write out a misspelled word, which readers can understand (or autocorrect can fix).

• Likewise, it's possible to know what a symbol means, but have no idea how to pronounce it. This is extra-fun in Japanese, where most kanji have multiple pronunciations.

• Looking up words is harder, as there are no "letters" to sort by. Sorting can be done by stroke count, by radical (four corners or SKIP), or by phonetic spelling (in pinyin or hiragana). Modern technology has made this easier, and some phone apps (like Pleco) can even OCR hanzi. Still, it's far less convenient than phonetic languages.

The only aspect in which logographic systems win is information density. You can fit more words on a single page. This is obvious if you've ever seen Chinese or Japanese copies of works that were originally written in English. The Harry Potter books are crazy thin. Also, Chinese and Japanese tweets can express a paragraph of information.

1. https://en.wikipedia.org/wiki/Character_amnesia

> It's possible to know how to say a word, but have no clue how to write it.

> Likewise, it's possible to know what a symbol means, but have no idea how to pronounce it.

As a second language learner of English I can attest that this is not just a problem of languages written in logographic systems:-)

>The only aspect in which logographic systems win is information density.

I vaguely remember a paper that claimed that information density is pretty much constant across languages and writing systems, but I couldn't find it as for now. There is another thread on HN [1] where people compared the size of "Universal Declaration of Human Rights" in different languages. I think this misses the point because it doesn't account for intra-character information density. It'd be much more interesting to render the text into a bitmap and then compare compressed bitmap sizes.

[1] https://news.ycombinator.com/item?id=8236135

People like to joke about English spelling, but see farther down-thread for examples of how bad things are in logographic systems. Even native-speaking PhDs can forget how to write words like "sneeze" or "toad". It's a failure mode that simply doesn't exist in phonetic languages (even ones as imperfect as English).

Sorry if it wasn't clear, but by "information density" I meant area on a page or screen, not digital bytes. In the thread you linked to, people correctly point out that digital information density depends on encoding and compression schemes matter far more than language.

The paper you're probably thinking of is A Cross-Language Perspective on Speech Information Rate[1][2], which (as the title indicates) studied spoken language, not written. Annoyingly, the study was widely misrepresented in the media. It found that languages with lower information density tended to have higher syllabic rates. That is: Spanish contained less information per syllable than English or Mandarin, but Spanish speakers spoke faster to make up for that. Most media summaries of the paper omitted an important finding: the compensations didn't balance out. Different languages had different information rates. In the study, English had the highest. The runner-up (French) was 10% slower. And Japanese was 30% slower at conveying information.

1. http://ohll.ish-lyon.cnrs.fr/fulltext/pellegrino/Pellegrino_...

2. This blog post has a more accessible summarization of the data: https://www.tofugu.com/japanese/why-do-japanese-people-talk-...

>Phonetic languages allow you to write out a misspelled word, which readers can understand (or autocorrect can fix).

You can certainly write things out in kana. When I was more serious about studying Japanese, I knew less than 1000 kanji, but had a vocabulary several times that size, and would at times write out the word I meant in hiragana. And if we're counting autocorrect, your IME is going to take that hiragana and let you find the character.

>• Looking up words is harder, as there are no "letters" to sort by. Sorting can be done by stroke count, by radical (four corners or SKIP), or by phonetic spelling (in pinyin or hiragana). Modern technology has made this easier, and some phone apps (like Pleco) can even OCR hanzi. Still, it's far less convenient than phonetic languages.

Eh, I disagree here. It's harder if you're used to looking things up by the spelling, but once you're fast at looking things up by radical, it's not that difficult. My misguided attempts at slogging through 1Q84 while reading at a, at best, middle school level got me pretty fast at looking up kanji. Not any appreciable difference vs. looking things up in a regular dictionary.

You cannot write things out in Kana in Chinese. As such, GP's point against logographic writing systems stands, notwithstanding mixed writing systems such as Japanese.

Even without autocorrect, you can write a word in English such that most people would understand. Of course, in a logographic system you'd just write a homophone (which is what people actually do, write a simpler word pronounced the same).

As for looking up, it is in principle easier though. You only need to learn the order of about 26 things, not about 200, and can then run iterative binary search over it, and don't have to switch to stroke count. It is possible, of course.

Some upper and lower case letters have no clear resemblance, see Aa Rr Gg Nn, so one has to learn 52 symbols. Add other 52 symbols for script, if you have to. Then in the case of English learn how to pronounce or spell words, because in some cases there are no rules (why ocean and not oshean? Because of derivation from Greek, still...)

Anyway, any alphabet is better than Chinese characters.

>• It's possible to know how to say a word, but have no clue how to write it. This phenomenon is called character amnesia, and it affects most native speakers.[1] Phonetic languages allow you to write out a misspelled word, which readers can understand (or autocorrect can fix). > >• Likewise, it's possible to know what a symbol means, but have no idea how to pronounce it. This is extra-fun in Japanese, where most kanji have multiple pronunciations.

I don't think English is much better in these cases. In fact, the writing can be so divorced from speech that spelling bees are a thing.

I've had Chinese colleagues who, when asked to write a word they'd just used in a sentence, were simply unable to. At first I thought they were playing a joke on me. But nope, they'd just forgotten the appropriate hanzi, and they couldn't even hazard a guess. It's a totally different failure mode than imperfectly-phonetic languages like English.
From Why Chinese Is So Damn Hard[0]:

> I was once at a luncheon with three Ph.D. students in the Chinese Department at Peking University, all native Chinese (one from Hong Kong). I happened to have a cold that day, and was trying to write a brief note to a friend canceling an appointment that day. I found that I couldn't remember how to write the character 嚔, as in da penti 打喷嚔 "to sneeze". I asked my three friends how to write the character, and to my surprise, all three of them simply shrugged in sheepish embarrassment. Not one of them could correctly produce the character. Now, Peking University is usually considered the "Harvard of China". Can you imagine three Ph.D. students in English at Harvard forgetting how to write the English word "sneeze"?? Yet this state of affairs is by no means uncommon in China. English is simply orders of magnitude easier to write and remember. No matter how low-frequency the word is, or how unorthodox the spelling, the English speaker can always come up with something, simply because there has to be some correspondence between sound and spelling.

0: http://www.pinyin.info/readings/texts/moser.html

To be fair, you can also "come up with something" in Chinese. Since there aren't all that many sounds, you can write in generic characters for the sound of the word that you can't remember.
Yep. The analogy I use is, it's a bit like if someone walked up and asked you to draw the logo of this or that company. Even if you've seen the logo a million times, you might not be able to summon up a mental picture of it, or you might remember the rough shape but have no idea how many lines go where.
I've never heard this term "Character Amnesia" but its an analogue to my situation.

I can read and write (via pinyin) a large number of characters, but cannot recollect their shape in abstraction.

I think that's just because as a foreigner learning chinese in the modern world I've never had to learn this skill.

The difference between Recollection and Recognition.

Same here - and strangely enough, it's rarely a problem. Faking characters by using the correct radical and a random homophone base character works okay in a pinch.

But because I never write characters by hand, I have a really hard time reading handwritten notes, and that is a problem.

> or autocorrect can fix

If you're bringing computers into it, isn't text entry in Japanese usually done phonetically anyway?

> For example English has according to the Oxford dictionary 171,476 words in current use excluding inflections, and several technical and regional vocabularies.

Here is a website which questions you with some random sample of words from an English dictionary, mixed with randomly generated non-words. Then it estimates the percentage of English words you know.

http://vocabulary.ugent.be/wordtest/start

I am a non-native speaker, and I have scored in the 77% to 89% range, when doing this test several times.

I'm curious: did you only answer yes to the words whose meanings you knew, or to anything that you knew was indeed a word? There were some that were pretty obviously words, but I wasn't certain the exact meaning (although I could guess), so I answered no. Ended up with 77% (as a native speaker). Apparently average for native speakers is 67%, so 77-89 as a non-native speaker sounds really good.
I just did it, and I answered yes to words I knew, or knew that were actual words but I didn't know the exact meaning of. Like Argon, I know it is something related to chemistry but I don't actually know what it is. Some words were compound words which I am not sure would be in a dictionary, but still valid words.

I got 73% and I didn't say 'yes' to any fake words.

73% is apparently "This is a high level for a native speaker."