| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by acrooks 3136 days ago

Given a set of questions - e.g. "how's the food" "how's the atmosphere" "how's the service" etc. - you could figure out how the restaurant scores relative to others by stack ranking based on the % of answers to a particular question that got a "Yes". The numbers should hopefully reflect a normal distribution and from there you get your /5 rating.

If everybody answers "yes" to all of the questions - good value, service, food, atmosphere - then that suggests to me that it's a great restaurant. And you can have a lot of questions that are even asked randomly to limit the number of questions per user.

I rate a lot of places highly that have great a lot of things but not great service, because I don't think the service is bad enough to bring it down. But that's data that is being lost.

I like your idea of stack ranking but with a different flavour. I think that "in your all time top 5" is a hard question to answer. How about this though - if we know you've been to Taco Place X and now you're going to Taco Place Y, maybe the question is "are the tacos at Y better than X", "is the atmosphere at Y better than X" or even "is Y better than X" (but I like the idea of collecting more granular data).

If you collect this^ data to stack rank. Then it definitely gives you a better distribution of restaurants relative to each other in each category.

As a consumer, with this level of granularity, I can select what I care about tonight. If I'm grabbing takeout for lunch at work, does a five star rating even matter? I should ask Siri "show me the top fast and delicious takeout restaurants near me" and she should do: "select name from restaurants where distance < 500m order by (speed + flavour) limit 3;" and from there I will pick something from that list that looks nice. That seems like a nice UX.

1 comments

hysthola 3136 days ago

There's a body of research on this, and it suggests that ratings are more meaningful if you add options, up to about 5 or 6 ratings.

That is, if you asked people to do the ratings once, and then asked them 1 hour later, there would be more consistency across time as you add options from 2 to 3 to 4, up to about 5 or 6.

The problem with binary ratings is that, as much as you might think otherwise, you're forcing a kind of hazy, grey experiential assessment into 0 or 1. And in doing so, people near the boundary (whatever that might be) will vacillate between them. E.g., people who feel "meh" about something are forced to choose something else, and sometimes they'll say 0 and sometimes 1. The more options you give, the more reliable / meaningful the ratings will be.

This example is interesting to me because it's something most people can relate to and illustrates the complications of utility-based and Bayesian formulations of the problem. You end up having to decide on utilities and/or priors.

To me the answer is to weight the data maximally in forming a posterior, in which case you end up using a reference prior. Similar kinds of arguments about utilities lead to reference priors. Reference priors can be complicated to compute, but for things like multinomials over ordinal ratings, reference priors have been worked out fairly well.

To me it always made sense to allow people to sort by the center of the estimate, or the lower bound (maybe using different language).

iainmerrick 3136 days ago

Slight tangent--

I think 1-4 stars is the ideal rating style. I wish that were used more often.

A choice of 1-4 stars gives you enough freedom to express your opinion, without being overwhelming. It's a small enough range to be reasonably objective (almost everybody will interpret it as 1 star = bad, 2 = passable, 3 = good, 4 = great). And with an even number of choices there's no middle "meh" option -- you're forced to make a choice between 2 and 3.

Of course it's important not to ruin it by adding extra options, like 0 stars or half-stars. (That was Ebert's big mistake!)

Edit to add: to relate this to the parent post, I'm thinking that maybe ranking things as 1-4 stars in several categories could be the best if both worlds.