|
|
|
|
|
by fjkdlsjflkds
709 days ago
|
|
> The R^2 used with a linear model requires a constant term, in this case the constant term or bias explains a lot about preferences (almost 50/50) so there is less information available for the slope term. This explains the paradox, basically. When you take the null model "preference = 50%" (i.e. intercept-only model), there simply isn't much residual variance left for the linear model to explain. That's why you get an R^2 = 1 if you use the "R^2 = rho(state, preference)^2" formula (you are ignoring the role of the intercept in explaining most of the variance, and exploiting the translation-invariance of the Pearson correlation) vs. you getting an R^2 = 0.01 when you use the (more correct) "R^2 = explained variance / total variance" formula. TL;DR: It makes sense to get a very low R^2 when it is the intercept and not the predictor that is explaining most of the variance. |
|
I'd say that there is still quite a lot of residual variance to explain. You need a baseline - the worst choice would be to predict 0 (or 1) and the mean squared error would be 0.5. Using 0.5 as baseline halves the mean squared error to 0.25.