Hacker News new | ask | show | jobs
by light_hue_1 674 days ago
That's not Simpson's paradox!

> In fact, while the new flow worked great on mobile, conversion was lower on desktop – an insight we missed when we combined these metrics.

> This phenomenon is known as Simpson's paradox – i.e. when experiments show one outcome when analyzed at an aggregated level, but a different one when analyzed by subgroups.

There's nothing strange about finding out that some groups benefit and others lose out when diving up you data. You're looking at an average and some parts are positive and others are negative. Where's the paradox there?

Simpson's paradox is when more button presses lead to more purchases. But then you look at desktop vs mobile and you find out that for both desktop and mobile more clicks doesn't mean more purchases (or worse, more clicks means fewer purchases).

That's why it's a paradox. The association between two variables exists at the aggregate level but doesn't exist or is backwards when you split up the population. It's not a statement about the average performance of something.

I would add a 7th A/B testing mistake to that list and it's not learning about basic probability, statical tests, power, etc. Flying by the seat of your pants when statistics are involved always ends badly.

3 comments

> Simpson's paradox is when more button presses lead to more purchases. But then you look at desktop vs mobile and you find out that for both desktop and mobile more clicks doesn't mean more purchases (or worse, more clicks means fewer purchases).

How could more button presses lead to increased conversion rates while hiding this data when comparing desktop and mobile? Wouldn’t you see at least one device type demonstrating higher CVR to reflect aggregate CVR increase?

That's Simpson's paradox!

You can take data where as a whole presses lead to more purchases. Then split it into two halves (like mobile vs desktop) and show that on both halves presses lead to fewer purchases.

The whole paradox is that the intuition we have for averages doesn't apply to correlations.

I suggest checking out the Wikipedia page.

> I would add a 7th A/B testing mistake to that list and it's not learning about basic probability, statical tests, power, etc. Flying by the seat of your pants when statistics are involved always ends badly.

This is where most tests fail, in my experience.

Everyone wants to run A/B tests because that’s what the big co’s are doing and they want to look like the sort of person BigCo might hire, but they’re making silly mistakes because stats is hard and not taught well at school.

Yep, I came here to say the same thing. The author has misremembered / misrepresented Simpson's paradox, which is much stronger than an aggregate hiding a group effect.