Hacker News new | ask | show | jobs
by EvanMiller 3137 days ago
I recommend the approach described in this article:

http://www.evanmiller.org/ranking-items-with-star-ratings.ht...

In this formulation, s_k equals utility. Like the Wilson score formula (and unlike the linked article), the provided equation takes into account the variance of the expected utility.

2 comments

I find that article very hard to follow -- there are lots of detailed formulas, but no obvious place where the prior distribution is discussed, or the utility score given to different star ratings. And the examples are all very abstract.

Edit to add: ah, I think I see, the utility of N stars is assumed to be N, and the prior is all ones. But aren't those the most important things to tune in a Bayesian model?

Another practical Bayesian approach that is much easier to understand and to productionize, is described here: https://www.johndcook.com/blog/2011/09/27/bayesian-amazon/

It does assume a Beta(1,1) prior however.

With star ratings, I think an important point that often gets ignored is: different people use stars in different ways. One user might 5-star most things, but give the occasional 4- or 3-star review if they have a problem. But another user might 3-star by default, and save their 4- and 5-star reviews for exceptionally good cases.

I wonder if a simple way to fix that might be to reinterpret everyone's star ratings as percentiles, based on the overall distribution of stars in their reviews. "This user gives 5 stars 10% of the time, so we'll interpret a 5-star review from them as anything in the range 90-100 -- assume 95%."

You would probably also want to reinterpret the results for each user. "This review scores average out as 84%. For user A, that's 4.5 stars, but for user B, it's only 3.5 stars."

The big downside is that star ratings become subjective. But they're already subjective, and ignoring that problem doesn't make the results any better. Average star ratings on all the big websites and app stores right now are garbage -- they'll usually warn you if some Amazon product is terrible, but that's about all.

If you crunch all the review data and figure out the best possible recommendations, you end up with collaborative filtering and the Netflix Prize. It's a shame that so much great work was done for that competition, but nobody seems to be using it now. Netflix themselves just use a trivial upvote scheme now.

But I wonder if there's some much simpler approach that still gets pretty good results.

Or even a simple thumbs up or thumbs down. Less open to interpretation on how the user uses stars. 1 star or 5 star basically.
I wrote this a couple of years ago [1]. I think we need to remove subjectivity on ratings by asking more specific questions and only allowing a binary answer.

1. Is the food good? 2. Is the service good? 3. Is the atmosphere good?

That's a pretty simple answer. Often when I see 1 star reviews it's because of a single element of the experience but not the overall experience.

It's easier to leave a review because there's less cognitive load. It's easier to search for what you want: if I have my foodie hat on, I don't particularly care about the service. If it's a night out with a customer, that becomes more important all of a sudden.

And then you can generate some sort of average score based on the answers to these questions to calculate the 5 star rating.

[1] https://medium.com/@acrooksie/no-more-5-star-rating-systems-...

I do prefer that over stars, but I think it potentially misses some information. Let's say most people answer "good" for all the categories. Does that just mean the place is good overall, or is it fantastic?

To put it another way, how do you distinguish the 4.0-star places from the 4.9-star places?

With conventional star ratings, you're reliant on most people using stars consistently. With a series of yes/no questions, you're relying on a potentially small pool of "no" answers to give you a useful signal.

I think stack ranking would be much more powerful. "How does this place compare to others? Average, better than average, in your all time top 5?" Everybody's feedback would be completely clear. It's not obvious how to aggregate that into a single rating number though.

Evan, I'm confused. The author refers to two articles by you and here is a third. Can you comment on where they come from, and in what order?