Hacker News new | ask | show | jobs
by A1kmm 5394 days ago
> Since Japanese has more syllables than any other tested language

The article claims less information is encoded in each syllable, not more (as would be expected if more syllables were available).

The Japanese syllabaric alphabets (hiragana and katakana) are larger than the roman alphabet, but the smaller alphabet doesn't mean that Japanese has more syllables than English. English syllables are written using multiple roman letters, and there are far more combinations possible than in hiragana or katakana (hiragana and katakana do allow small letters written between characters to modify the syllables represented, but even taking this into account, there are far more syllables possible when writing English).

On top of this, written hiragana or katakana maps unambiguously to the spoken language, but with English, there is more than one possible pronunciation for many character sequences, and the speaker often needs to know the word and sometimes even how it fits into the sentence to know which of several possible syllables to pronounce.

5 comments

A nice segue into a key point about Linguistics, and some Japanese facts.

The study of Linguistics, explicitly, does not deal with orthography, or the written system of languages. There are of course exceptions with good reasons, but orthography systems are rarely, if ever, good representations of the systems of auditory communication that are formally considered languages. An orthography system can be heavily influenced by geopolitics (Chinese), have severe ambiguities (Arabic, the various Latin alphabet systems), or have been created retroactively (many languages of indigenous peoples). While it is convenient to map a spoken language to its related orthography when discussing topics such as syllables, inflection, and morphology, it is rarely appropriate when studying linguistics formally.

As for Japanese, while it is true that its alphabet system has a relatively straightforward mapping to its phonology, the mapping itself is, unfortunately, not unambiguous. Japanese has a tonal system[1] that is not explicit in its orthography. There are examples of phonemically distinct words that are identical when written in hiragana/katakana.

Finally, there exists a moraic system[2] which sits between the phonemic and syllabic abstractions. Japanese, especially, have many phenomenons that cannot be adequately modelled unless working in this in-between system.

[1]: http://en.wikipedia.org/wiki/Japanese_pitch_accent

[2]: http://en.wikipedia.org/wiki/Mora_%28linguistics%29

So basically, it's bus width vs. clock rate. Although it looks like there is a natural upper bound on throughput.

One thing the article doesn't mention but the paper goes into is the syllabic complexity. Vietnamese and Chinese both have a ridiculous amount of tones (from a Western perspective). From the paper:

  Language  Syllable Set  Weighted Syllabic Complexity
  English   7,931         2.48
  French    5,646         2.21
  German    4,207         2.68
  Italian   2,719         2.30
  Japanese  416           1.93
  Mandarin  1,191         3.58
  Spanish   1,593         2.4
English gets the density from a huge syllable set and an average syllabic complexity. Mandarin has a fairly small set but high complexity.

From my experience with Japanese, it seems like it has evolved to compensate for the low density:

A lot of the pronouns (I/he/she) tend to be dropped and assumed from context

Some verb forms take the place of longer phrases: taberu koto ga dekimasu->taberaremasu

In spoken/casual usage, many phrases are shortened: oiteoite -> oitoite, my personal favorites are the arigato gozaimasu-> mumble-zaimasu or the irrashaimase->mumble-mase

Oitoite comes from oite oku I believe. To put, and leave something in place.
Yeah, maybe not the best example because it's sort of a repetition of oku, meaning put it there (implied:so I can do something with it in the future). It just stuck in my head because I had heard it conversationally before I learned it in class and had a ding lightbulb moment.

Maybe a better example of ~teoite->~toite shortening is aketeoite (proper, 6 syllables), meaning open it (for some future purpose) -> aketoite (spoken/casual, 5 syllables)?

I would love to find the original paper and examine the actual measure of information they used. i have found that my mother tongue tamil which are known for the speed, has a greater range for context-sensitive interpretations compared to English.So much so that i prefer English in any communication related to work.
Sorry that was my point. I agree Japanese doesn't have more syllables, but a denser syllabic structure and less information packed into the x amount than the ambiguous standard they set.

Less information is space x and equal time to convey information = seemingly faster speech. Fairly intuitive.

>On top of this, written hiragana or katakana maps unambiguously to the spoken language

This is true in probably ~99% of cases, but not all.