| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kgwgk 756 days ago

Your calculation is not directly related to the model (and associated R²) discussed in the article which are about the prediction of individual votes using the state as predictor - not state averages using the state as predictor.

Maybe I'm completely missing your point but the calculations in the blog post are, adapting your code (I think you meant mean where you wrote sum):

  > data <- data.frame(state = rep(c(0, 1), each=20), pref = c(rep(0, 11), rep(1, 9), rep(0, 9), rep(1, 11)))

  > mean(residuals(lm(pref ~ 0, data = data))^2) # null model [NOT IN THE BLOG POST]
  [1] 0.5

  > mean(residuals(lm(pref ~ 1, data = data))^2) # BASELINE intercept-only model
  [1] 0.25

  > mean(residuals(lm(pref ~ state + 0, data = data))^2) # predictor-only model [NOT IN THE BLOG POST]
  [1] 0.34875

  > mean(residuals(lm(pref ~ state, data = data))^2) # MODEL
  [1] 0.2475

  > summary(lm(pref ~ state, data = data))$r.squared # MODEL
  0.01

The blog post is about what you call "intercept-only" model (MSE 0.25) and the full model (MSE 0.2475), the R² is (0.25-0.2475)/0.25=0.01. His calculation is slightly different: instead of 0.25-0.2475 he calculates directly 0.05^2 which is the variance of the predictions (in this case the total variance 0.25 can be decomposed as the variance of the errors 0.2475 plus the variance of the predictions 0.0025).

1 comments

fjkdlsjflkds 756 days ago

(After re-reading the blog post with more care...) you are right, and thanks for the correction.

Either way, the point stands... the improvement in using a full linear model (that predicts 0.45 or 0.55, depending on state) is marginal compared to the baseline model that always predicts 0.50, as you demonstrate with your code.

To me, this doesn't seem paradoxical... the predictor is indeed providing little information over the "let's flip a coin to predict someone's voting preference" null/baseline predictor, since people's preferences (in aggregate) are almost equivalent to "flipping a coin".

note: I meant "sum", but it's the same, since the ratio between sums of squares is equivalent to the ratio between mean squares

link

kgwgk 756 days ago

> Either way, the point stands... the improvement in using a full linear model (that predicts 0.45 or 0.55, depending on state) is marginal compared to the baseline model that always predicts 0.50

Yes, I think we don't disagree. I was just puzzled by the "little variance left to explain" remark.

> note: I meant "sum", but it's the same, since the ratio between sums of squares is equivalent to the ratio between mean squares

You're right, sum of squares made sense if it was just for the ratio.

link

fjkdlsjflkds 756 days ago

Thanks for taking the time to clarify my confusion.

It's not that there is "little variance left to explain", but actually that (no matter what) there will always be too much variance left to be explained, when the response is Bernoulli-distributed and the parameter is not too far from 0.5 (i.e., the data generating process is like flipping a slightly loaded coin).

If you use the expected value to predict the Bernoulli variable, you will always be somewhat wrong (0.45 and 0.55 are both far from 0 and from 1, which are the only possible responses).

If you use a binary response to predict, you will quite often be very wrong, even if you are right on average, and even if your prediction is to generate Bernoulli-distributed samples from the exact same distribution (i.e., you know exactly how the coin is loaded/biased and you can exactly replicate its data generation process).

So... yeah... no "paradox" ;)

link