Hacker News new | ask | show | jobs
by iambateman 3131 days ago
You know how people say: “on a scale of 1-10”?

I’ve used the principle of Millers law to start asking people to measure on a scale of 1-7.

Universally, people balk at the scale. But I explain to them that most people can’t tell the difference between 2 and 3 on a ten point scale. If you can’t articulate a difference, there’s no use in the measurement.

Seven is great because you get more than the simplicity of 1-5. So... 1 - the worst 2 - bad 3 - below avg 4 - average 5 - above average 6 - good 7 - the best.

And don’t even THINK about responding with “5.5”. ;)

7 comments

There is actually a whole theory around this, and has lots of implications for the designs of hedonic and sensory scales.

I work on modeling human sensory perception and preference of food and beverage products, and have had to design scales that work as a true "metric";

Most scales suffer from 3 primary problems:

1) avoidance of the endpoints

2) tendency towards the mean

3) minimum information gain

For example; on a 10 point scale, very few (> .5% of respondents) will mark a 1 or 10 (this is problem 1). In addition, 5's are over represented VS the expected amount of 4's and 6's (problem 2).

These problems together reduce the amount of information inferable from the collected data. There is a number of ways to measure this, including information theory (think of the avoidence of the end points and tendency towards the mean as a lossy compression algorithm for the true signal) or as a sampling of an unrepresentative population to infer the posterior distribution.

A 100 point scale has the same problems as above, and in addition suffers from a lack of consistency (reproducibility) - respondents are likely to give a product a different score (say a 92 and 94) when asked about the same product multiple times. This will frequently lead to non-parametric rank reversals, which 1) prove that a 100 point scale is not a "metric" and 2) show that the amount of information is further reduced at higher optionality.

Thus - the discrete scales that work best are:

A) 1 - 7

B) 1 - 13

as they both do not suffer from avoidance of the end points, both have no selectable mid-point (forcing respondents to choose a point above or below the median), and are highly replicable (very few respondents will switch rank orders).

One place I worked we changed the scale from 1-5 to qualitative descriptors (I don't remember the exact words but something like "poor, average, good, great, perfect") and it significantly increased the information we got. Previously we almost never got any scores other than 1,4,5. Afterwards we started seeing more 2s and 3s. It seemed that people only needed one value for "bad" so putting "average" at 2 was very helpful.
I must be missing something... why isn't 4 a midpoint of 1-7 and 6 a midpoint of 1-13? I'd think you would want an even number of points to prevent over-selection (like 0-7 or 1-8).
The psychological effect is "tendency towards the mean", and in this case, the mid-point isn't the mean.

Forcing individuals to choose a score above or below the mean yields a better sampling of the true distribution in this case.

What? 1+2+3+4+5+6+7 = 28. 28 / 7 = 4. So 4 is both the midpoint and the mean?
Perhaps people don't differentiate between 1-7 and 0-7 and assume that 3.5 is always going to be the midpoint?
That is correct.
Perfect, I’m screenshottimg this answer and showing it to my friends when they get weird when I ask them how much they like their beer on a scale of 1-7.

Because science.

There is research into this - your intuition is pretty much in line with best practices (though 7 isn’t magic compared to 5 or anything). Eg http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.839...
While this might be very true, I'm not sure it's in any way related to Millers Law.

The 7 +/-2 rule is, in part, about the design and layout of information so as to better aid the end user in leveraging the way their memory stores information spatially.

In short, it's about how to best design a system to give information TO a user.

What you're talking about is pretty much the reasoning behind methods like 'Fist of Five', 'five star' or '5 face' voting systems. These are methods well grounded in psychology and I might argue going up to 7 is unnecessary.

The international Baccalaureate uses a 7 point scale: http://www.isnsz.com/media/images/Assessment_6.width-800.png

I generally like my scales to be logarithmic. At work, I made the following scale to explain difficulty of technical items to non-technical people. It works quite well:

  Effort Scale

  1: Easy peasy
  2: Trivial but time-consuming
  3: Some invention required and non-trivial
  4: Invention required
  5: Lot of invention required 
  and for special occasions:
  6: This is a whole new startup
Yelp and Uber ratings, and other 5-star ratings bother me. There is no generally agreed upon standard. Sometimes people give 5 stars only for exceptional service, however, most of the time, people give 5 stars if they weren't wronged in any way. It's a mess! Also, most people aren't qualified to differentiate between five different levels of service/food. It should either be generally agreed upon logarithmic scale, or it should be a yay-meh-nay scale with no guilt in giving a meh.
With Uber I considered a 4 to be "nothing wrong", and reserve 5 for "amazing service above and beyond".

However it seems that drivers who have an average of say "4.2", despite being perfectly fine and giving above and beyond service 1 in 5 times, gets kicked off. This means that 5 is acceptable and there's no way to mark above acceptable.

I'm odd, I don't expect every trip I get to be above average. That's clearly not possible.

I'm the same way. My instinctive reading of a 5 star system maps it onto a bell curve. So if I received exactly the service I expected, with zero complaints, that's obviously a 3 out of 5. 5 stars means two standard deviations above my expectations. It feels to me like a company that gets nothing but 4 and 5 star reviews should be taking issue with their marketing department for setting expectations way too low.

But in an Uber, giving less than a perfect score can get someone fired. So it's 5 stars unless the driver literally spits on me, and then it might be 5 stars and a complaint. Anything lower than a 5 star rating feels unethical, like stiffing a waiter on a tip.

What I can't figure out is how this is of any use to Uber. They have created a "metric" where a large chunk of their customers regard answering honestly as a social faux pas, at best, so what do they think they're measuring?

People know this and the"real" Uber rating scale is a ten point scale between 4 and 5. Same with Amazon product reviews. Anything under 4 doesn't get a look.

I think a simple thumbs up/thumbs down would be just as effective.

That 4.2 thing really disturbed me. Most people driving for Uber aren't doing it because they want to over their alternative to nothing. I couldn't, in good conscience, give out a rating that puts someone's livelihood at risk.

I only gave out one 3 star, but that was because the driver was very bad, though not taxi-driver bad, which is what I would consider 1-star.

This is very close to one of the key criticisms behind Net Promoter Score when we was looking at implementing it for a client.

Basically it uses an 11 point scale and your net promoter score is the proportion of 0-6 (negatives) subtracted from the positives (8 or 9-10). I may have got ranges wrong but the idea is the same. This was touted as "the only number you need to improve" for business success.

Chile's education system uses a 1-7 scale for grades
It could be argued that most western system do also, anything below a D is a fail. Depending on how you look at it we either have a 5 or 8 level system if +/- is used.
I'm not sure why you were DV'd, but in grad school in the US a C is considered failing.
in the UK a C would be considered a good grade even with the grade inflation -at uk unis a third aka gentleman's degree Is the failing grade

When I was at school in the UK getting an A meant you where really good and a triple A at A level was super rare and less than you would need to get into Oxbridge

0-6 FTW