Hacker News new | ask | show | jobs
by weinzierl 3591 days ago
> It's possible to know how to say a word, but have no clue how to write it.

> Likewise, it's possible to know what a symbol means, but have no idea how to pronounce it.

As a second language learner of English I can attest that this is not just a problem of languages written in logographic systems:-)

>The only aspect in which logographic systems win is information density.

I vaguely remember a paper that claimed that information density is pretty much constant across languages and writing systems, but I couldn't find it as for now. There is another thread on HN [1] where people compared the size of "Universal Declaration of Human Rights" in different languages. I think this misses the point because it doesn't account for intra-character information density. It'd be much more interesting to render the text into a bitmap and then compare compressed bitmap sizes.

[1] https://news.ycombinator.com/item?id=8236135

1 comments

People like to joke about English spelling, but see farther down-thread for examples of how bad things are in logographic systems. Even native-speaking PhDs can forget how to write words like "sneeze" or "toad". It's a failure mode that simply doesn't exist in phonetic languages (even ones as imperfect as English).

Sorry if it wasn't clear, but by "information density" I meant area on a page or screen, not digital bytes. In the thread you linked to, people correctly point out that digital information density depends on encoding and compression schemes matter far more than language.

The paper you're probably thinking of is A Cross-Language Perspective on Speech Information Rate[1][2], which (as the title indicates) studied spoken language, not written. Annoyingly, the study was widely misrepresented in the media. It found that languages with lower information density tended to have higher syllabic rates. That is: Spanish contained less information per syllable than English or Mandarin, but Spanish speakers spoke faster to make up for that. Most media summaries of the paper omitted an important finding: the compensations didn't balance out. Different languages had different information rates. In the study, English had the highest. The runner-up (French) was 10% slower. And Japanese was 30% slower at conveying information.

1. http://ohll.ish-lyon.cnrs.fr/fulltext/pellegrino/Pellegrino_...

2. This blog post has a more accessible summarization of the data: https://www.tofugu.com/japanese/why-do-japanese-people-talk-...