| > But in reality, it would be very surprising if performance and evaluation of performance were independent. We expect people to be able to accurately rate their own ability. This seems to be attacking an irrelevant point in the analysis. The argument goes as such: Researcher carries out all the studies needed to prove the Dunning-Kruger effect, then trips and drops all the results into a vat of acid. But he's ashamed and quickly generates random numbers for the results, and somehow the data still proves the Dunning-Kruger effect. Not just that, repeating the same exercise again and again with completely random data leads to the same result, the effect is always present. So is the Dunning-kruger effect so powerful that it exists in the very fabric of the universe devoid of any human interaction, or is something amiss? In this situation we are forced to look at the test we have that concluded from the data that the Dunning-Kruger effect exists and conclude that it's a bad test, we need something different. You seem to be arguing "oh no, you can't look at random data, because we wouldn't expect the experiment to yield random data!". But that doesn't work as an argument for why the test should still be considered good. If it's supposed to have any worth, then the test has to be able to come to one of two conclusions: The Dunning-Kruger effect exists or the Dunning-Kruger effect doesn't exist. And if the test is set up such that for positive experimental results, or just random noise, it comes out in the positive, and only in extremely unlikely and a narrow band of the possible outcome space come out negative, then the test is bad. If we want to try to rephrase everything a bit to make the issue much clearer. Lets set up a coin-toss competition between ChatGPT and a group of 100 people. Each participant goes 1:1 against ChatGPT where both parties toss a coin and whoever has the most heads wins, on draws toss again, in case a pair goes into an infinite loop that doesn't end before our allotted trial time, they get removed from the study. A human assistant tosses on the behalf of ChatGPT on account of it not having arms yet. Now we ask each person how they would rate their ability vs. ChatGPT in a coin-toss, everyone answers 50/50, for obvious reasons. So we run the experiment, the line for "ability plotted against ability" is a straight diagonal line. The line for estimated ability vs actual ability is a a straight flat line at 50%. Eureka! To the presses! we have just proven the Dunning-Coin-Kruger effect! People who are worse at throwing coins tend to over estimate their ability, and people who are better at throwing coins underestimate their ability! What a marvelous bit of psychological insight, it really tells us something about how the human mind works, and has broader insights about our society! But naturally we always expected this outcome, people who are bad a tossing coins are dumb and of cause they are overconfident, not like people who are good at tossing coins who have a remarkable Intellect about themselves and are therefore humble in their self estimation... and so on and on about preconceived biases that have nothing to do with the actual test we performed. |
So yes, both are null results, but only one is interesting.
For instance, we would probably expect there to be a correlation between height and ability to dunk a basketball. If someone were to show that there is not a correlation, that would be an interesting result. Just because random data would match my result doesn't mean my result is nonsensical. Getting results that look like random data is still a result--it just means there isn't a correlation.