Hacker News new | ask | show | jobs
by lysium 5394 days ago
Hm, .91 'information per syllable' for English at an average rate of 6.19 syllables per second are 5.6 'information' per second vs. .49 * 7.84 = 3.8 for Spanish.

How is 5.6 "more or less identical amount of information" as 3.8? That's a 47% difference!

2 comments

Just from the information available in the time article, and from a tiny bit of thinking, I presume there are 2 things going on here:

First the article claims that 1 is set arbitrarily to Vietnamese for the value of information density. There are many common ways to normalize measurements, it is obviously not 1/SPSvietnamese for this case. There are most likely other considered factors in the information density calculation (whose unit we don't actually know btw... but it is presented as a ratio otherwise they wouldn't normalize to 1), or they could just be doing some other statistical funging.

Second: With a little bit of thinking you could realize that you aren't getting a good scientifically sound write-up from the Time article -- mostly because this is how they present things (consumable for the masses!). The article writer could be picking completely arbitrary measures as important for people to puzzle over and say "Wow!" at and ignoring the real results. It has happened countless times in the past and will continue to do so for the forseeable future.

Basically what I am saying: if you want to do the incredulous thing, please put some thought in first.

> if you want to do the incredulous thing, please put some thought in first

Didn't I just do that? You seem to assume I am incredulous about the paper whereas I commented on the article.

Putting some thought into it. I think you multiply not add information density, but there are also limits on how complex a message you can decode. Consider the information density of each sylible is also limited by the grammer used.

1 cm left 4 cm up

If you drop '4 cm up' you go from a 2D to a 1D but replace up with blue does not mean anything.

It sounds like the information density of a given syllable is decided subjectively. The subjective measurements are probably not proportional. It makes sense - Imagine if you're tasked with giving a numerical value for how much information is in a sentence. Working through the sentence syllable by syllable and picking an arbitrary 'information score' of each syllable, then adding the information scores together won't get you a particularly good result.

The conclusion of the article (that all languages transmit data at the same rate) is supported by the research, but not proved by it. I suspect that if you read the paper, the researchers wouldn't make it sound so conclusive.