| That's not Simpson's paradox! > In fact, while the new flow worked great on mobile, conversion was lower on desktop – an insight we missed when we combined these metrics. > This phenomenon is known as Simpson's paradox – i.e. when experiments show one outcome when analyzed at an aggregated level, but a different one when analyzed by subgroups. There's nothing strange about finding out that some groups benefit and others lose out when diving up you data. You're looking at an average and some parts are positive and others are negative. Where's the paradox there? Simpson's paradox is when more button presses lead to more purchases. But then you look at desktop vs mobile and you find out that for both desktop and mobile more clicks doesn't mean more purchases (or worse, more clicks means fewer purchases). That's why it's a paradox. The association between two variables exists at the aggregate level but doesn't exist or is backwards when you split up the population. It's not a statement about the average performance of something. I would add a 7th A/B testing mistake to that list and it's not learning about basic probability, statical tests, power, etc. Flying by the seat of your pants when statistics are involved always ends badly. |
How could more button presses lead to increased conversion rates while hiding this data when comparing desktop and mobile? Wouldn’t you see at least one device type demonstrating higher CVR to reflect aggregate CVR increase?