Hacker News new | ask | show | jobs
by fritzw 3261 days ago
I think Netflix probably did it because people are inherently bad at being objective. For the average person, 2 stars for a movie and 4 for another isn't based on anything measurable, even they couldn't explain. I'm shocked at some of the amazon product reviews, most of which are 5 star reviews even if the product is absolutely terrible. Movies are different than products, but it's the same people doing the reviewing. Remember, the average user is not a thinking analytical HN user. Average people are much better at bool choices.

Whether the buttons do anything, I have no idea

6 comments

I know I'm in the minority here, but I am a big fan of the new system. I would torture myself trying to decide between, e.g., 3 or 4 stars for movie. And then go back and re-rate other movies that I realized I liked more but rated lower than the just-rated movie.

Their % match numbers are fairly accurate, but I have had to go into the watch history and delete the occasional movie watched and finished that we actually hated. No number of 1-stars (or thumbs downs) would eradicate its effects on the recommendations.

My system was (and is, in DVDs):

  5 - Absolutely loved it, will buy a disc
  4 - Good, but won't buy a disc
  3 - Movie was okay
  2 - Not a good movie
  1 - Stopped watching 20 minutes into it
My problem with binary choice is that 1 == 2 and 3 == 4 == 5, whilst 1 and 5 were very special for me. :(
Plus the scale bias differing vastly between people and cultures makes the data a mess. Like say or me a 5 means 100% perfect, Why discreet choice stuff is all the rage in the market research world. (unless that's changed in last few years)

Asking people "which of these 3 things you like best" vs. "rate these 3 things 1-5" will usually give you much more useful data, plus be easier for respondents.

Popular recommendation algorithm like collaborative filtering by matrix factorization takes into account the accounts for user and item biases (the simplest method is to normalize the ratings of a particular user by the average of ratings of that user).
Couldn't you control for that by weighting people's ratings by the range in which they provide them? Like weighting a 5-star review a bit more from someone who averages 3's than someone whose ratings average 4's? Far from perfect sure but I bet it could save a lot of results from needing to be thrown out.
With stars you can cross compare with others to see if they have the same score. With simple thumbs up recommendations you cannot compare the ratings as the score is whether it appears to you or not.
If what you say is true, that's an argument in favor of eg aggregating via median instead of average.