Hacker News new | ask | show | jobs
by lenepp 2480 days ago
Apologies for being harsh, but this kind of thing is the phrenology of our time. I know it's utterly conventional to think this way about language in some circles that present themselves as doing legitimate science, but the view that you can calculate the amount of information in human speech, except in a super-technical sense that doesn't match any of the reporting on this study or the way people are interpreting it, has to be called out for the total nonsense that it is. It doesn't bear a moment's honest reflection.

And yes, I know information theory. It's language that these folks - many of them prominent and celebrated within their utterly normalized professions, just like in the days of phrenology - are fundamentally mistaken about. What quantity of information do you think there is in the word "trump," for instance? Is it the same over time, to bring up just one feature of how this funny thing called context informs human speech?

Wittgenstein's Philosophical Investigations is a good place to start if anyone's interested in understanding this issue.

4 comments

They aren't talking about the semantic information of the word "trump". They explain the methodology for calculating information, and it's per syllable (based on the number of distinct syllables that are part of the language's phonetics). So, for English speakers, 'trump' has exactly 7 bits in it. That exact syllable may or may not exist in another language, but if so the same singly syllabic word "trump" would have a different number of bits to a speaker of that language. Maybe next time RTA?
In other words, they aren't factoring in compression.
>Maybe next time RTA?

I think it's you that has missed the point. Syllables have a very loose correlation to information. So great; we can stream out 39bits worth of syllables / second. In what way does that describe how information dense those syllables are? Context matters here.

I think the fact that context matters so much is why we don't try to quantify it. The word 'trump' can covey a lot of meaning or next to nothing, eg in a card game the word trump can covey a lot of information about the state of play and your reaction to it to your competitors. It doesn't take any longer to say and in the context of the game may take less time to think up as well.
The researchers are not making any claims wrt semantic information density.
s/exactly 7/a little more than 7/
You're saying phonology is the new phrenology?

Jokes aside, I agree that estimating the average absolute information content of a syllable seems pretty absurd.

However, if the primary goal here was to determine whether some languages convey more information per unit time than other languages, I think the authors did fine. To this end, they needn't define information per syllable in anything other than p.d.u. - procedurally defined units. If average Vietnamese speech has 2x the number of syllables/min as German, but it takes the same amount of time to recite War and Peace in both Vietnamese and German, it suggests that both languages convey the same high-level information 'per unit time', but not 'per syllable'.

And basically that's all they did... "We computed the ratio between the number of syllables [in the text passage] and the duration [it took to recite the passage]"

What do you see as the contradiction between Wittgenstein and information theory?
> And yes, I know information theory.

You clearly don't know linguistics though because the idea that a word conveys a constant quantity of information is hilarious.