Hacker News new | ask | show | jobs
by borramakot 2386 days ago
> Adding up all the inaccurate redness ratings—“gray,” “pretty gray,” “whitish gray,” “muddy brown,” and so on—and averaging them leads us further away both from learning anything reliable about the individuals’ personal experiences of the rose and from the actual truth of how red our rose really is.

I don't understand this comment. How does averaging noisy signal, even systematically noisy signal, result in something that is noisier than any individual signal? I would have assumed the average would converge on (real signal + systematic error).

2 comments

The author is arguing that the real signal is zero and the systematic error is large, so you will always end up converging on a repeatable but useless value. Technically, taking only one sample could have gotten you closer because there is a 50% chance that the random error would have gone in the opposite direction to the systemic error, although the author is wrong to phrase that like it's some kind of advantage, because the other fifty percent of the time the random error will make the total error even worse.
They say later that

> When a feedback instrument surveys eight colleagues about your business acumen, your score of 3.79 is far greater a distortion than if it simply surveyed one person about you—the 3.79 number is all noise, no signal.

Which implies to me that they believe there is signal there, but that it goes away when aggregated?

I think by "surveyed" they don't mean "asked one person for a score" but rather got some overall information from one person including their qualitative feelings and perceptions. There is signal in those as they discuss elsewhere in the article, but the quantitative rating allegedly has no value even when averaging. That's the charitable reading, anyway.
Yeah, I'd like to see what statistical theory they are using here. I don't think it's sound. It's unfortunate since I think the article is otherwise quite good.
If methodology is unsound, there can be negative value and outcomes. Ie. active trading is a consistent loser for most people. Methodology can lead people astray.
? I still don't get it.
Averaging values with random divergence from the truth is useful. Averaging values with random divergence from nonsense is not.
Specifically, I don't get how is one random value is supposed to more accurate than many random values averaged.
I think that's more of a relative impact. When you have just one measurement, you know it's not particularly reliable. When you have a bunch, we are conditioned to think it's more reliable.

So in the latter case, the distance between its reliability and its perceived reliability is greater than in the former case.

I agree with you, that it has to do with perception of reliability. However, the article seems to state that there is an actual greater error with more inputs. That's what I don't understand.

"We cannot remove the error by adding more data inputs and averaging them out, and doing that actually makes the error bigger."

I don't see how it "makes the error bigger". Maybe I'm being too literal and the writer is truly referring to the perception of the results carrying more weight, and therefore having a "bigger error".

An individual rating can tell you something about the individual’s experience.

Averaging the ratings of multiple people tells you nothing since it washes out the individual experiences. I.e. individual samples hold meaning about the samples themselves but aggregation of samples is just noise.