Hacker News new | ask | show | jobs
by mpotter 6071 days ago
Hi, I'm Mike from Steepster. We thought we'd share our new ratings system we just deployed with HN as we think it's relevant for products with customer reviews, ratings, etc. It's our attempt to combat the 4.3 dilemma (discussed here recently: http://news.ycombinator.com/item?id=883890).

Background: Steepster is a community site for tea drinkers to share their tasting notes, get recommendations, and discover new teas.

Feedback appreciated!

3 comments

Mike, great job on this. Very informative. I have 2 questions for you:

1. is your slider from the jQuery UI or other js framework?

2. in regards to combating the 4.3 dilemma, have you found the average ratings on steepster to be lower? maybe its too early to tell, but I'd love to see some sort of curve on your ratings distribution in a future post ...

thanks

Thanks, callmeed.

1. Yep, slider implementation is jQuery UI.

2. It is too early to tell, but we're definitely planning to share a follow up. As mentioned in the post, we had a simple thumbs up/down for ratings and were seeing a greater than 90% positive average, so we were definitely experiencing that bias. Just today, albeit with a much too small sample size, we're starting to see a more diverse mix of averages. We still expect to have that positive skew but because we're now operating with a 100 point scale in the UI, we hope the granularity will help users distinguish subtler differences in rating.

It'd also be interesting to know if the number of ratings decrease or increase. I wonder if your users will find the added granularity a nuisance or an incentive.
It will be interesting. It's important to note the nature of our community and whom we expect to contribute. Generally, we're geared toward a more passionate user who we find to be more than willing to contribute at this level of granularity. So we've made the choice to cater toward their needs while still trying to remain accessible.

But, this is a good point, and I think an important one to consider when evaluating the mechanic that works best for your community/site.

I'm late to comment on this post, but I've read several posts on rating systems recently and not many seemed to mention this.

Have you tested the theory that response bias is skewing the ratings upward?

In other words, given that not every tea drinker is going to bother logging on to your site, finding each tea they've tasted and leaving a rating, it seems plausible to me that people who've had a good tea-drinking experience are more likely to make that effort than those who've had an unremarkable tea.

The figures that you quote seem to support this theory. With a yes/no rating system you had 90% yes votes, or an average vote (assuming one yes vote cancels out one no) of 0.8, skewed 80% up from the unweighted mean you'd expect. With a 5-star rating you expect an average rating of 4.3, which is 65% up from the unweighted mean ((4.3 - 3.0) / (5.0 - 3.0)). So adding granularity is decreasing the skew, but not very rapidly. I'd be very interested to know what your averages are like now, with your new system.

Granted, some of the skew is due to what teas are available to be rated: presumably people are less likely to enter awful teas into your system in the first place. I realise the point of your article is about redesigning the rating to combat that skew, rather than necessarily about finding the one true rating. But if you're concerned about bias, it seems worth at least investigating all possible sources of bias.

There's an obvious asymmetry in this hypothesis - why would the response bias be in favour of strong positive experiences, rather than just strong experiences in general? Even if drinkers of mediocre tea can't be bothered to vote, why wouldn't people who've had terrible tea be just as likely to vote as those who've had great tea? I can think of two explanations. One is an innate sense that a good experience is worth more effort than a bad one - so after a bad pot of tea, there's less of an impulse to run off and tell everyone how bad it was, more to just write it off and go do something else. Another is that people motivated to tell people about bad experiences might want to do so in words, to explain what was so bad about it.

I realise this is all hand-waving - I don't have hard evidence to back up this theory. I do think it would be an interesting theory to test, for those with sufficient levels of usage of a rating system to do so.

Some anecdotal evidence comes from my use of other UGC review sites, particularly restaurant reviews. These sites usually have a disproportionately large number of negative reviews. Reading reviews for nearly any restaurant leads you to conclude that restaurant has terrible service - apparently because those who've had bad experiences are always keen to vent about them. Yet nearly all restaurants, even those with tens of vocal unhappy customers, have above-average ratings.

Very clever. Have you considered making the slider non-linear (the distance on the slider between Yuck and Meh is smaller than between Good and Awesome)? If most people are going to rate their tea somewhere between Good and Awesome, it allows more of the slider to be used.

Have you received enough ratings with this new interface to know if my assumption of ratings being clustered is accurate?

We haven't considered making it non-linear. It's an interesting suggestion, though I'd hesitate to go that way only because the user then lacks a clear 1:1 model of how the slider directly affects their rating (without explanation). We haven't received enough ratings yet to prove your assumption. When we do and if it does hold true, our assumption now is that we have _enough_ of a scale to expose meaningful differences.

You've given us good food for thought, thanks. My general feeling now is that I think it's important to leave the negative portion of the slider intact (however less it's used) to maintain a solid mental model. Might be something to test down the road though.