|
|
|
|
|
by kgwgk
709 days ago
|
|
Your calculation is not directly related to the model (and associated R²) discussed in the article which are about the prediction of individual votes using the state as predictor - not state averages using the state as predictor. Maybe I'm completely missing your point but the calculations in the blog post are, adapting your code (I think you meant mean where you wrote sum): > data <- data.frame(state = rep(c(0, 1), each=20), pref = c(rep(0, 11), rep(1, 9), rep(0, 9), rep(1, 11)))
> mean(residuals(lm(pref ~ 0, data = data))^2) # null model [NOT IN THE BLOG POST]
[1] 0.5
> mean(residuals(lm(pref ~ 1, data = data))^2) # BASELINE intercept-only model
[1] 0.25
> mean(residuals(lm(pref ~ state + 0, data = data))^2) # predictor-only model [NOT IN THE BLOG POST]
[1] 0.34875
> mean(residuals(lm(pref ~ state, data = data))^2) # MODEL
[1] 0.2475
> summary(lm(pref ~ state, data = data))$r.squared # MODEL
0.01
The blog post is about what you call "intercept-only" model (MSE 0.25) and the full model (MSE 0.2475), the R² is (0.25-0.2475)/0.25=0.01. His calculation is slightly different: instead of 0.25-0.2475 he calculates directly 0.05^2 which is the variance of the predictions (in this case the total variance 0.25 can be decomposed as the variance of the errors 0.2475 plus the variance of the predictions 0.0025). |
|
Either way, the point stands... the improvement in using a full linear model (that predicts 0.45 or 0.55, depending on state) is marginal compared to the baseline model that always predicts 0.50, as you demonstrate with your code.
To me, this doesn't seem paradoxical... the predictor is indeed providing little information over the "let's flip a coin to predict someone's voting preference" null/baseline predictor, since people's preferences (in aggregate) are almost equivalent to "flipping a coin".
note: I meant "sum", but it's the same, since the ratio between sums of squares is equivalent to the ratio between mean squares