Hacker News new | ask | show | jobs
by EvanMiller 3881 days ago
The method described here is simple because it's only looking at the mean of the belief about each item; it uses the prior belief as a way either to sandbag new items or to give them a bump. I tend to advocate methods that take into account the variance of the belief in order to minimize the risk of showing bad stuff at the top of the heap.

I have a newer article (not mentioned here) that ranks 5-star items using the variance of the belief. It ends up yielding a relatively simple formula, or at least a formula that doesn't require special functions. Like the OP I use a Dirichlet prior, but then I approximate the variance of the utility in addition to the expected utility:

http://www.evanmiller.org/ranking-items-with-star-ratings.ht...

The weakness of the approach (as well as the OP) is that it doesn't really define a loss function for decision-making (i.e. doesn't properly account for the costs of an incorrect belief), which one might argue is the whole point of being a Bayesian in the first place. In practice it seems that using a percentile point on the belief ends up approximating a multi-linear loss function, but I haven't worked out why that is.

3 comments

This is interesting stuff, but I wonder has anyone verified the results in practice? These methods are all quite simple. They assume, for example, that the quality of an item is independent of the quality of the surrounding content. This is clearly not true. When Steve Jobs died, for example, no other new in the tech community was going to get air time. There is also the need for a variety of content. I think we all know how boring it is to read endless "I wrote X in Y" posts on HN, where X is some simple software system like a blog and Y is the language du jour (Node.js / Go / whatever).

In the machine learning community the above problems are addressed with submodular loss functions, bandit algorithms, and no doubt other methods I don't know about. Now I don't value complexity for its own sake, so I wonder if the additional power these approaches bring is warranted.

I tend to advocate methods that take into account the variance of the belief in order to minimize the risk of showing bad stuff at the top of the heap.

Penalizing variance would be the opposite of my intuition. Given a boring low-variance item with 10 3-star votes, and a divisive item with 5 1-star votes and 5 5-star votes, I'd think you'd want the one at the top to be the one with the medium chance that they'll "love" it than a high chance they'll find it passable.

If you further assume that the average person is going to check out the top few results but only "buy" if they find something they really like, the risky approach seems even more appealing. A list topped by known mediocre choices has a low chance of "success". What's the scenario you are envisioning?

The kind of divisive item you describe is rare, at least on Amazon. What happens most commonly is that everyone loves something or everyone hates it, with some noise (e.g. 10% 1 or 2 star reviews). In this case, it makes sense to promote the item that has a 4.5 mean score and 100 reviews over one that has a 4.7 mean score and only 5 reviews. You want to account for the uncertainty when there are few ratings. If you don't, all the items at the top of your search results will be 5-star 1-review products.
I saw this post's headline and it reminded me of a "how not to sort by average rating" post I read many years ago. I looked it up on Google [1], read it, clicked through to this HN thread... lo and behold, the top comment was written by the author of the article I just looked up. Great stuff.

[1] http://www.evanmiller.org/how-not-to-sort-by-average-rating....