| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by huggah 4635 days ago

Often data contains structural features that initially aren't observed and without them appear to support a hypothesis only weakly. By exploiting the structure, we can see things much more clearly.

An example: say we want to know how a drug affects cognition. We give a simple test to a bunch of people on and off it, blinded, etc. The control group's average score is 74, and the test group's average score is 72. We can use a t test to see if there's a statistical difference, and find there isn't. We can't conclude anything about the drug.

Now imagine we have exactly that same data, but we were careful to give two tests to each person (in a random order, and different tests, of course). We take another look at the data and find out that every single participant scored lower when they were on the drug. With even a fairly small sample size this provides strong evidence that the drug impairs cognition, and probably tells us quite a bit about how much it does.

The article is probably talking about multivariate regression; the more important number comes a few sentences later---"retention was related more strongly to manager quality than to seniority, performance, tenure, or promotions". So presumably, they did the same sort of analysis, carefully pairing people who were similar in as many ways as possible, and found out that good managers are more important than seniority in terms of employee retention. The more variables you have, the more even large differences can hide in raw group averages.