|
I once encountered this in the real world as a data analyst a long time ago. I was working at an e-commerce company, called The Hut Group, and the whole year our marketing team had been saying our marketing cost of goods sold (the percentage of our revenue we needed to spend on marketing) had been declining across every product category. But at year end, the execs were shocked to realize that our cost of goods sold had almost doubled, from 10% to nearly 20%. The finance team had asked me to double check the marketing team's numbers, to see if there'd been some funny math in the reporting. But the marketing team were totally right, marketing spend across the three main categories - games, beauty, and nutrition had all fallen (~15% to ~10%, ~30% to ~25%, and ~50% to ~30% respectively). However, the mix of these product categories had shifted massively, with nutrition growing from roughly 10% of our total sales to now nearly 50%. In net that meant that whilst the marketing team had gotten more cost-efficient at selling every individual product category, the growth in the nutrition industry had vastly outstripped the growth in all other categories, and since that was the highest individual category, the aggregate marketing costs % had gone up, even though the team had improved every category. I then had the fun job of explaining the Yule Simpson paradox to a bunch of accountants. |
A network engineer took a trip to Indonesia or something (can't find the citation to confirm the exact tale), noticed the service was slow, and when asking around everyone said "that's how its always been." Basically the local cellular networks are slow and off island fiber connects are saturated. Back at the office they decide to attack the problem by optimizing payload sizes. Does the work, reducing download sizes by half, and ships it. Latency metrics? Average and p95 latency actually increased after shipping the work to production.
How does an objectively good change make things worse? Well, the service had improved for those customers so much that they used it a lot more. Even with the lighter demand on bandwidth the network latency to the datacenter was worse than typical US customers, so as more of these people realized the service sucked way less, they used it more and drove the numbers up.
I have tons of these examples where a data team looks at a particular slice of request telemetry, and comes to a wrong conclusion because they didn't model enough of the system, or controlled for the wrong (or too many) variables. The worst ones the cyclic finger pointing situations that Simpson's paradox can produce: App developers blaming a regression on the server side component while the server team blames the app team, often because the server and app release schedules accidentally aligned too well. In this case we have canary data to exonerate our side of the equation, but sometimes the problem lies in even deeper spaces, like app updates from an entirely different app.