Hacker News new | ask | show | jobs
by crazygringo 3222 days ago
Counterpoint: I almost solely rely on the stars histogram in Yelp (available only on the website, not the app), completely ignoring whatever Yelp's calculated "average" is.

If a place has more 5-star ratings than 4-star ratings, it's generally amazing. If it has more 4-star ratings than 5-star ratings, it's generally fine but not something particularly special.

Just thumbs up/down would eliminate what is, to me, the single most useful aspect of Yelp.

It doesn't matter that star ratings are arbitrary -- when you average enough of them out, a clear signal overrides the noise. You can distrust any given user, while still trusting the aggregate.

(Curiously enough, I don't find any equivalent value on Amazon. On Yelp, you're really evaluating an overall experience along a whole set of dimensions, so there's a lot more to discriminate on. On Amazon, it does seem to be more of a binary evaluation -- does the product work reliably or not?)

2 comments

I used to think the same thing until I realized the most accurate and consistent ratings I use on a regular basis is rotten tomatoes. And they're based on strict thumbs up/ down.

It ensures votes hold equal weight and that "extreme polar" voters don't skew things. It also avoids the opposite problem of "everything is neutral" vote unless horrible/incredible.

RT also handles high brow and low brow well. You get less voting of "eh I didn't love it, but it's sophisticated so I'll give it an extra star."

I'm sold on simple up/down.

Rotten Tomatoes is good and predicting a movie I (or others) like, but not really at "ranking". Zootopia, one of their top movies of 2016 and a 98% rating, is a good movie, but one I'm unlikely to pursue again. The Godfather (with a 99%) rating, is a movie I will pick up on Blu Ray and revisit many times. It's far more than 1% better than Zootopia.

So RT is good at predicting "should I watch this movie I haven't watched before", but bad at predicting more sophisticated habits or preferences. I wouldn't buy the Blu Ray off a RT prediction, but I would rent.

So it becomes a question of what are you trying to accomplish? For some issues up/down is a good way to solve a problem, for others it isn't.

Rotten Tomatoes actually has both ratings, meaning they recognize the limitation you're referring to. In the other, Zootopia has 8.1/10 and The Godfather has 9.2/10, showing that difference in quality.
Also you just aren't the demographic for zootopia. If you have kids then it probably is worth buying and they will watch it many times. There are so many genres of films, it's best to compare within a single genre and not between.
> Rotten Tomatoes is good and predicting a movie I (or others) like, but not really at "ranking". Zootopia, one of their top movies of 2016 and a 98% rating, is a good movie, but one I'm unlikely to pursue again

It feels like you're mixing together two different arguments. Rotten Tomatoes is good at predicting whether someone will like a movie. What is "ranking"? That is a very undefined concept. Ranking of what? It's clearly not ranking of likelyhood of a person liking a movie because rotten tomatoes already does that.

Later you mention likelihood of repeat watchings of a movie. Rotten Tomatoes thumbs up or down based on whether someone liked a movie, as a result it produces a metric on likelihood of someone liking of movie. Instead if rotten tomatoes immediately after watching a movie, asking "Did you like this movie?", asked "Would you watch this movie again?" then it would produce an indicator of re-watchability.

Up/down doesn't matter - it's the question that's being asked.

note the caveat RT obviously doesn't actually ask critics these questions, they read and judge their reviews and interpret them as answering those questions.

In my experience, my favorite movies I find via glowing reviews. Rotten tomato completely obscures this view: if all the reviewers kind of like it, it'll get 100%, whereas polarizing films always suffer. I'll take "kids" over "star wars" any day for a better movie. Why? I'm gonna see star wars because i want to, not because I expect a meaningful aesthetic. But Rotten Tomatoes takes the opposite tact, pushing me towards crowd favorites rather than what i might rate highly.

Really this comes down to how terrible one dimensional comparisons are: it only measure popularity, which is a terrible filter for quality.

I used to religiously research movies on RT - with a lot of success in my mind. With the user rating, the critic rating, and the "top" critic rating, you can infer a surprising amount about who is going to like any given film, and you learn over time where you fall on the critic/top critic/audience graph.

Recently, however, it seems like more (imo undeserving) movies that are "just ok" - like decent, but nothing special, romantic comedies and big blockbusters - are scoring above 90%. I might be being curmudgeonly about it, but I've nearly stopped checking it because it feels like there's no information there. My theory is that this started happening once Roger Ebert died... without such a leader in the field, no one is willing to say they didn't like a film unless it's obviously very bad.

I pay a lot of attention to histograms when there are many high-rated options for the same Amazon product type. A histogram that curves sharply in its number of 5-star reviews to almost nothing on the other end is the product you want (ignoring fake reviews for the sake of this conversation).

Amassing a bunch of 4- and 5-star ratings is easy, but leaving nothing for even the most habitual of complainers to complain about? That's an monumental achievement.

For things like books, I also find that reading the middling reviews often gives the best S/N ratio. It weeds out the fanboys and weeds out those who were clearly not the audience for the book (or just have some ax to grind). You're more likely to get the "I really love this author in general but I didn't care for this book because 1.) 2.) 3.)."
Agreed. For products in Amazon above a certain star threshold (say, 3+), I evaluate given the shape of the review histogram, particularly minimizing the size of the bump down at 1-star and 2-star.