|
|
|
|
|
by godelski
2487 days ago
|
|
Why did everyone assume it would be the same on average? This seems weird to me. Also, can you explain more about how the information density was calculated? Anything at the bit level seems crazy small to me. Words convey a lot of information. They cause your brain to create images, sounds, emotions, smells, etc. I guess we're calling language a compression of that? But even still, bits seems small. |
|
(see edit below; but i leave this up; it might be interesting, also) you mean that even for smaller sequences, the UID holds, right? the assumption was that even for a single sentence, there are a lot of ways to reduce or increase information density so that you get a smoother signal. e.g.: "It is clear that we have to help them to move on.", you could contract it to "it's clear we gotta help them move on" and contract it even further in the actual speech signal ('help'em'). or you could stretch it: "it is clear to us that we definitely have to help them in some way to move on", or alike. the assumption was that such increases / decreases would even be done to 'iron out' the very local peaks and troughs, particularly in speech.
bits: yeah, that took me a while to get used to, as well. the authors used (conditional) entropy as a way to measure information density (which is a good measure in this instance imv). and bits is just per definition the unit that comes out of information theoretical entropy: https://en.wikipedia.org/wiki/Entropy_(information_theory) . btw: while technically possible, i don't think that the comparison in the summary article between 39 bits in language and a xy bit modem is a helpful comparison. bits in the context of entropy are all about occurence and expectation in a given context. bits of a modem/in CS, they represent a low level information content for which we do not check context and expectation.
edit: ah, i realise you are asking why most in our community assumed that this universal rate applied across languages, right?
i guess the intuition was that all of us humans, no matter what language we speak, use the speech signal to transmit and receive information and that all of us have the same cognitive abilities. so the rate at which we convey information should be about the same. sure, there are probably differences according to some factors (spoken vs written language, differences in knowledge between speakers, etc.). but when the only factor that differs is English vs Hausa, esp. in spontaneous spoken language, then the information rate should be about the same.