Hacker News new | ask | show | jobs
by carlmr 1130 days ago
>Most Mandarin words are disyllabic or longer, and 400×400 = 160k is enough combinations for a quite large vocabulary.

While true, I'd bet that some combinations dominate because they sound better/are easier to pronounce.

Also just because you can technically differentiate 160k sound pairs doesn't mean you can do it in a noisy environment.

Japanese and Korean have a similarly limited number of syllables and have very long words compared to English. I'm guessing because they don't have tones.

If you look at communication theory you don't only need distinct sounds, you also need error correction. Which requires extra bits of redundant information.

Tones just make it possible to carry extra bits.

Longer strings of syllables like in Japanese and Korean do the same.

More complex syllables, like in English, too.

It's just multiple different ways of carrying enough bits in speech to work in a noisy environment.

Another analogy could be password strength. You can have a very long numeric password (Japanese & Korean), A password with a mix of a-zA-Z0-9 of medium length (English). A password with weird special characters but shorter (Chinese), and they all end up having the same entropy (given that the password rules are known to the attacker).