| This might be too off-topic, but just kill it if you think it is. Otherwise, here goes: I have a question about regression to the mean. Suppose you have a set of pairs (a,b) corresponding to students in a class. a = the student's score on the first midterm, b = score on second midterm. If you plot the pairs with a on x-axis, b on y-axis, then get the least-squares line, you have an upward sloping line. The line slope should be less than 1, indicating regression to the mean. If you plot b on x-axis, a on y-axis, the slope is necessarily now greater than 1. But I fail to see what has changed in the analysis -- a and b are both just supposed to be samples from the same distribution, right? This has been driving me crazy, so I'd love some help. Thank you! |
x < mean => y > x
x > mean => y < x
If the scores are normalized. Regression to the mean is that most people move towards the mean in subsequent games/attempts/whatever.
But I fail to see what has changed in the analysis -- a and b are both just supposed to be samples from the same distribution, right?
Not at all. b is not independent of a, thats the whole point of regression to the mean. If you take ordered pairs where there is no connection between a and b, then you won't get any regression to the mean, you'll get points essentially randomly placed on the plane.