| That is a fun experiment. I studied linguistics in college, and I do not think anyone ever discussed textual density of different languages with the "same" content (the latter part would be its own terrifying chestnut; if you have not studied machine translation and semantic eval and good luck ever confirming such a statement). I studied Arabic a lot, and Chinese about a year. I cannot speak to Chinese with only one hazy year under my belt, but I can speak to Arabic. Because Arabic has lots of syntax realized at the morpholgical level, you can encode a whole sentence (subject (with declension inherent and gender variable, verb conjugated (to passive/active, past/present/future, standard/subjunctive) and direct object (declension inherent and gender variable) all in one word as we know the in English. أضربه (A-dr-b-u; a (I) dr-b (hit) u (him/it): I hit him (present tense) And that is a super simple example. I have seen much more compicated setences in one word, and even better in two or three. So, I hypothesized Arabic is very, very dense. I think and Russian and others could be considered similar. However, with this level of density (maybe we argue "compression" from a CS perspective) I noticed books and their translation were routinely about the same length in pages. Never identical mind you, but never something crazy like 50 pages more (I am guessing; it has been a long time since I made such an experiment and would have trouble agreeing with someone on what is significant). Now, one could hypothesize a shitload about what this means, but computation is realized as the same "stuff" (machine code instructions) in programming languages, where no parallel exists in human language for mapping human language to computaion, as far as I know from my between minor and major courseload in linguistics, specifically computational linguistics. If someone can contradict me, I would LOVE to read about measured cognition and language constructs. |
In terms of information density per syllable, mandarin wins, with english coming in a close second. When speaking, english usually has more syllables per unit time than mandarin, so english has the highest spoken information density of any language. Japanese is the on the opposite end of the spectrum. Despite having the highest syllabic rate, it has the lowest information density.[1]
For written information density, logographic languages win. This is pretty obvious if you've seen a Chinese or Japanese translation of something familiar, such as a Harry Potter book. They're ludicrously thin.
1. See the figures at the end of this paper: http://www.ddl.ish-lyon.cnrs.fr/fulltext/pellegrino/Pellegri...