Hacker News new | ask | show | jobs
by akjetma 2483 days ago
Scientists started with written texts from 17 languages, including English, Italian, Japanese, and Vietnamese. They calculated the information density of each language in bits—the same unit that describes how quickly your cellphone, laptop, or computer modem transmits information. They found that Japanese, which has only 643 syllables, had an information density of about 5 bits per syllable, whereas English, with its 6949 syllables, had a density of just over 7 bits per syllable. Vietnamese, with its complex system of six tones (each of which can further differentiate a syllable), topped the charts at 8 bits per syllable.

how can you encode 643 syllables using 5 bits? same for 6949 syllabes/7 bits?

1 comments

If I understand this correctly, it isn't that they are uniquely encoding each syllable. It's that they are encoding the information in each syllable. Many syllables have very low information content and must be combined with other syllables to convey information. Many other syllables are redundant.