Hacker News new | ask | show | jobs
by andimus 3841 days ago
The problem with that interpretation is that it ignores how much something deviates from the expected random. If everything was completely random:

- 25% of the time, the actual and perceived ranking would be identical.

- 62.5% of the time, the difference would be at most one.

- Only in 37.5% of cases would there be a strong difference in scores (2 or more stars out of four).

Would you say that 44% is a good score on a multiple choice test? Obviously candidates did better than random guessing, but not by enough to call it a strong correlation. Determining the strength of a correlation is what R-squared is for, which makes it the right choice for this analysis.

1 comments

But the problem with R-squared in this case is that a linear model isn't appropriate for ordinal data. These rankings are ordered categories, not numbers.
The comment I was replying to was not about ordinal data and I stand by it in its context.

But on your point, the Jury's still out on what to do with Likert-type data like this. Purists do say that it's ordinal and therefore you can't do any numeric analysis on it, but practically speaking you can learn a lot from it by treating it as interval data.

In my opinion, statistical analysis like this is ALWAYS two parts speculation and one part science (if you're lucky). It's designed to give insight without declaring fact so you're allowed to bend the rules a little if it's in the spirit of the math.

My "gut" feeling agrees with the original post. If we treat the data as ordinal data, then we can only look at the accuracy, and 44% is, if anything, more compelling.

Cards on the table: I was involved in analyzing the data in the first place and have a vested interest in things-- but I am saying what I believe without any intentional bias.