| When I wrote "In this case it measures the relative reduction in MSE" I meant exactly that. The mean squared error of the baseline model which doesn't include the state as a regressor is 0.25 (it predicts always 0.5 - it's off by 0.5 in every case). The mean squared error of the model which includes the state as a regressor is 0.2475 (it predicts 0.45 or 0.55 depending on the state - in both cases it's off by 0.45 with 55% probability and it's off by 0.55 with 45% probability). The mean squared error is directly related to variance when the predictor is unbiased. The ratio of the sum of squares is the same as the ratio of the mean square errors. Edit: http://brenocon.com/rsquared_is_mse_rescaled.pdf "R2 can be thought of as a rescaling of MSE, comparing it to the variance of the outcome response." https://dabruro.medium.com/you-mention-the-average-squared-e... "Also it is worth mentioning that R-squared (coeff. of determination) is a rescaled version of MSE such that 100% is perfection and 0% implies the same MSE that you would get by simply always predicting the overall mean of the dataset." |
There must be a formula to compute R^2 from variances both among states and inside states but anyway, when the variances inside any state are bigger that the total variance that should imply that the feature that divides the population in groups is of little value for prediction so it should have a small R^2 value.