Hacker News new | ask | show | jobs
by n1231231231234 2487 days ago
This is really cool. I am working in a related area and I think most of us have assumed that on average, the information rate is 'about the same' for the languages across the world. So it's exciting to see that their results confirm this assumption.

Two qualifying remarks.

1) The 'about the same' is important. Even in their data, there is still quite some variance. They found an average of 39bits, with a stdev of 5. That means that about 1/3 of the data falls outside of the range of 34-44bits.

2) Which brings me to the the uniform information density (UID) hypothesis. According to the UID, the language signal should be pretty smooth wrt how information is spread across it. For many years, the UID was thought to be pretty absolute: Even across a unit like a sentence, it was thought that information will spread pretty evenly. Now, there is an increasing amount of research that shows that esp. in spontaneous spoken language, there is a lot more variance within in the signal, with considerable peaks and troughs spread across longer sequences.

2 comments

Why did everyone assume it would be the same on average? This seems weird to me.

Also, can you explain more about how the information density was calculated? Anything at the bit level seems crazy small to me. Words convey a lot of information. They cause your brain to create images, sounds, emotions, smells, etc. I guess we're calling language a compression of that? But even still, bits seems small.

> Why did everyone assume it would be the same on average? This seems weird to me.

(see edit below; but i leave this up; it might be interesting, also) you mean that even for smaller sequences, the UID holds, right? the assumption was that even for a single sentence, there are a lot of ways to reduce or increase information density so that you get a smoother signal. e.g.: "It is clear that we have to help them to move on.", you could contract it to "it's clear we gotta help them move on" and contract it even further in the actual speech signal ('help'em'). or you could stretch it: "it is clear to us that we definitely have to help them in some way to move on", or alike. the assumption was that such increases / decreases would even be done to 'iron out' the very local peaks and troughs, particularly in speech.

bits: yeah, that took me a while to get used to, as well. the authors used (conditional) entropy as a way to measure information density (which is a good measure in this instance imv). and bits is just per definition the unit that comes out of information theoretical entropy: https://en.wikipedia.org/wiki/Entropy_(information_theory) . btw: while technically possible, i don't think that the comparison in the summary article between 39 bits in language and a xy bit modem is a helpful comparison. bits in the context of entropy are all about occurence and expectation in a given context. bits of a modem/in CS, they represent a low level information content for which we do not check context and expectation.

edit: ah, i realise you are asking why most in our community assumed that this universal rate applied across languages, right?

i guess the intuition was that all of us humans, no matter what language we speak, use the speech signal to transmit and receive information and that all of us have the same cognitive abilities. so the rate at which we convey information should be about the same. sure, there are probably differences according to some factors (spoken vs written language, differences in knowledge between speakers, etc.). but when the only factor that differs is English vs Hausa, esp. in spontaneous spoken language, then the information rate should be about the same.

> esp. in spontaneous spoken language, then the information rate should be about the same.

This is entirely non-intuitive to me. I would think with language evolving that some would be faster than others. If language starts as conveying extremely simple thoughts then it should take longer to convey certain things. I would then assume that as the language develops it gets better at conveying ideas. I would think that thoughts could go much faster than how we process it with language. Like I have constant thoughts that are really fast and can be complex. There's no internal dialogue there. But when I think with an internal dialogue it is much slower.

I think there is a distinction between "flux of incoming information" and "net knowledge gained by human as a result of incoming information".
After a few cocktails, once or twice, I've wondered with friends whether some "fuzzy" information rate constant might be a reference by which our brain understands the passage of time. In other words: if there is a fundamental processing rate of x/time, then theoretically, wouldn't our brains subconsciously use that for all kinds of neat reasons?

And the rate wouldn't have to be the exact same value for each individual, so long as the brain can attune its specific value to other reference points to time in nature.