| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by AmirS2 4258 days ago

Looks like the author explicitly considered that:

> To try to understand whether people really were this bad at the task or whether perhaps the task itself was flawed, I ran some more stats. One thing I wanted to understand, in particular, was whether inter-rater agreement was high. In other words, when rating resumes, were participants disagreeing with each other more often than you’d expect to happen by chance? If so, then even if my criteria for whether each resume belonged to a strong candidate wasn’t perfect, the results would still be compelling

The result of the Fleiss' kappa test subsequently run was negative, i.e. people didn't agree with each other either. So maybe the author's judgement was wrong, but that doesn't affect the conclusions.