Hacker News new | ask | show | jobs
by _delirium 5545 days ago
The prediction context is somewhat different than the analysis context, I think. For predictive recommender systems, the most relevant part of this analysis is the critique of error measures. It may well be that MSE is not an error measure that aligns with the system's actual accuracy goals (e.g. something like perceived quality of the recommendations).

When it comes down to it, the end goal is just to predict whether someone would like something, and/or present them a list of the things you are most certain they'd like. In the analysis context (as with much of HCI), the scales are being used to draw qualitative conclusions about tasks and preferences, so it makes sense to directly attack erroneous modeling and assumptions, because it can lead to wrong conclusions. But for prediction, erroneous modeling only really matters to the extent that it means we're: 1) optimizing the wrong thing; or 2) doing optimization suboptimally.

#1 is important to get right, but #2 is more of a "whatever works" sort of thing, and we even have fairly good automatic methods for deciding. If treating ratings as numerical data empirically leads to good predictions, then it's fine to do; if not, then it's best avoided. Many recent systems avoid even having a human make those kinds of decisions, by throwing in a giant bag of possible ways of slicing the data, and then handing off the decision about which of them to use, and how to weight them, to an ensemble method. Iirc, that's what the winning Netflix-prize entry was like.

1 comments

I'd add an alternative:

  #3: Suggest items that they are likely to really love.
This is subtly different than predicting what the user is most likely to like. To optimize with RMSE scoring, you are better off suggesting a sure "4" than a risky "5". For buying an expensive item like a car or a stereo, the safe bet might be a good approach. But for books, music, or movies --- easily sampled, one of a series --- I'd be much more excited by a system that can predict A+ items with even 25% probability than one that offers up straight B items with 80% consistency.