Hacker News new | ask | show | jobs
by trendia 3568 days ago
The problem with most of these is not adjusting for cohort changes. For instance, in the SAT example, the author writes:

> In the 1980s, the Reagan administration seized on a report called A Nation at Risk, which claimed that the US was on the verge of collapse due to its falling SAT scores.

Suppose that low-income individuals start to take the SAT in 1980 whereas they didn't in 1970. The wrong way to analyze SAT scores is to evaluate:

sum over cohorts P(SAT Score | cohort, Y)

where Y is the year. For instance, you might compare the total average score in 1980 vs. 1970. Doing so will show a decrease in SAT score because of the increase in low-income individuals taking the SAT, not because the high-income individuals are doing worse. (This assumes that low-income people have less access to SAT training materials, and those training materials affect the score).

The correct way is to only compare scores within a cohort:

P(SAT Score | cohort, 1980) > P(SAT Score | cohort, 1970)

That is, did the same cohort do better in 1980 vs. 1970?

(There might still be some differences between the cohorts in 1980 vs. 1970. Maybe the low-income individuals who took it in 1970 had high confidence in school, whereas the 1980s kids were from a broader background.)

1 comments

The article addresses that a couple of paragraphs down. Thats her whole point.To quote:

"The Nation at Risk report that started it all turned out to be bullshit, by the way -- grounded in another laughable statistical error. Sandia Labs later audited the findings from the report and found that the researchers had failed to account for the ballooning number of students who were taking the SATs, bringing down the average score.

In other words: SATs were falling because more American kids were confident enough to try to go to college: the educational system was working so well that young people who would never have taken an SAT were taking it, and the larger pool of test-takers was bringing the average score down."

I was converting her text into math-ish notation. She's saying that

P(SAT Score > 700 | 1970) > P(SAT Score > 700 | 1980)

is inaccurate, and that we should instead use:

P(SAT Score > 700 | cohort, 1970) ~ P(SAT Score > 700 | cohort, 1980)