| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by expazl 980 days ago

> But in reality, it would be very surprising if performance and evaluation of performance were independent. We expect people to be able to accurately rate their own ability.

This seems to be attacking an irrelevant point in the analysis. The argument goes as such: Researcher carries out all the studies needed to prove the Dunning-Kruger effect, then trips and drops all the results into a vat of acid. But he's ashamed and quickly generates random numbers for the results, and somehow the data still proves the Dunning-Kruger effect. Not just that, repeating the same exercise again and again with completely random data leads to the same result, the effect is always present. So is the Dunning-kruger effect so powerful that it exists in the very fabric of the universe devoid of any human interaction, or is something amiss?

In this situation we are forced to look at the test we have that concluded from the data that the Dunning-Kruger effect exists and conclude that it's a bad test, we need something different.

You seem to be arguing "oh no, you can't look at random data, because we wouldn't expect the experiment to yield random data!". But that doesn't work as an argument for why the test should still be considered good. If it's supposed to have any worth, then the test has to be able to come to one of two conclusions: The Dunning-Kruger effect exists or the Dunning-Kruger effect doesn't exist. And if the test is set up such that for positive experimental results, or just random noise, it comes out in the positive, and only in extremely unlikely and a narrow band of the possible outcome space come out negative, then the test is bad.

If we want to try to rephrase everything a bit to make the issue much clearer. Lets set up a coin-toss competition between ChatGPT and a group of 100 people. Each participant goes 1:1 against ChatGPT where both parties toss a coin and whoever has the most heads wins, on draws toss again, in case a pair goes into an infinite loop that doesn't end before our allotted trial time, they get removed from the study. A human assistant tosses on the behalf of ChatGPT on account of it not having arms yet.

Now we ask each person how they would rate their ability vs. ChatGPT in a coin-toss, everyone answers 50/50, for obvious reasons.

So we run the experiment, the line for "ability plotted against ability" is a straight diagonal line. The line for estimated ability vs actual ability is a a straight flat line at 50%.

Eureka! To the presses! we have just proven the Dunning-Coin-Kruger effect! People who are worse at throwing coins tend to over estimate their ability, and people who are better at throwing coins underestimate their ability! What a marvelous bit of psychological insight, it really tells us something about how the human mind works, and has broader insights about our society! But naturally we always expected this outcome, people who are bad a tossing coins are dumb and of cause they are overconfident, not like people who are good at tossing coins who have a remarkable Intellect about themselves and are therefore humble in their self estimation... and so on and on about preconceived biases that have nothing to do with the actual test we performed.

2 comments

Gunax 977 days ago

But we would not expect the coin toss to have a correlation. Whereas we might expect a correlation between actual and perceived ability.

So yes, both are null results, but only one is interesting.

For instance, we would probably expect there to be a correlation between height and ability to dunk a basketball. If someone were to show that there is not a correlation, that would be an interesting result. Just because random data would match my result doesn't mean my result is nonsensical. Getting results that look like random data is still a result--it just means there isn't a correlation.

igor47 978 days ago

Thanks this is a great way to rephrase the OP to bring out the salient parts

igor47 978 days ago

thinking about this more (i'm replying to myself!) -- i guess what the experiments for D/K show is exactly that performance on a test is uncorrelated with your idea of the performance on a test.

yes, it's kind of surprising that, having dropped the "real" results in a vat of acid, our hapless researcher replaces the missing data with random numbers and gets the same result -- but that's only because we didn't expect random numbers to model the outcome.

instead, we would have expected that, towards the bottom of the distribution of test-takers, those folks would rate themselves lower, while towards the top they would rate themselves higher. at the extreme of perfect self-awareness, the line for subjective results would exactly match the line for objectively-scored results.

this is the exact argument that is made in the post linked in the top comment: > by using random data to argue that the Dunning-Kruger effect is not real, the author is arguing to default to the base assumption. But which base assumption do they make? One even more radical than what’s proposed by Dunning-Kruger. In the author’s world, the Dunning-Kruger study should be interpreted in the reverse direction, claiming that there is at least some self-awareness in the way people self-assess.

source: https://andersource.dev/2022/04/19/dk-autocorrelation.html