|
|
|
|
|
by n1231231231234
2487 days ago
|
|
This is really cool. I am working in a related area and I think most of us have assumed that on average, the information rate is 'about the same' for the languages across the world. So it's exciting to see that their results confirm this assumption. Two qualifying remarks. 1) The 'about the same' is important. Even in their data, there is still quite some variance. They found an average of 39bits, with a stdev of 5. That means that about 1/3 of the data falls outside of the range of 34-44bits. 2) Which brings me to the the uniform information density (UID) hypothesis. According to the UID, the language signal should be pretty smooth wrt how information is spread across it. For many years, the UID was thought to be pretty absolute: Even across a unit like a sentence, it was thought that information will spread pretty evenly. Now, there is an increasing amount of research that shows that esp. in spontaneous spoken language, there is a lot more variance within in the signal, with considerable peaks and troughs spread across longer sequences. |
|
Also, can you explain more about how the information density was calculated? Anything at the bit level seems crazy small to me. Words convey a lot of information. They cause your brain to create images, sounds, emotions, smells, etc. I guess we're calling language a compression of that? But even still, bits seems small.