|
|
|
|
|
by justk
709 days ago
|
|
The math is correct, I am referring to your comment:
>>
R² is a measure like any other. In this case it measures the relative reduction in MSE - which is low because the prediction of individual votes remains quite bad even if the state is taken into account. I may be reading too much from your comment, but it seems that you relate R^2 to the reduction in the prediction error in each state, so it seems you are thinking about the formula of computing the R^2 as the (average variance in each state)/(total variance), that I think is not correct in general since at least it should require the total variance to be the sum of the variances in each state. If you based your ideas in that formula then your intuition is not correct, that is my point. When I apply R^2 I am thinking in a multivariable linear model with continuous variables, and this is not the case. I should measure this problem by how the entropy change when we apply the information about the state, something like the cross entropy using the total distribution and the distribution by states. |
|
The mean squared error of the baseline model which doesn't include the state as a regressor is 0.25 (it predicts always 0.5 - it's off by 0.5 in every case).
The mean squared error of the model which includes the state as a regressor is 0.2475 (it predicts 0.45 or 0.55 depending on the state - in both cases it's off by 0.45 with 55% probability and it's off by 0.55 with 45% probability).
The mean squared error is directly related to variance when the predictor is unbiased. The ratio of the sum of squares is the same as the ratio of the mean square errors.
Edit: http://brenocon.com/rsquared_is_mse_rescaled.pdf
"R2 can be thought of as a rescaling of MSE, comparing it to the variance of the outcome response."
https://dabruro.medium.com/you-mention-the-average-squared-e...
"Also it is worth mentioning that R-squared (coeff. of determination) is a rescaled version of MSE such that 100% is perfection and 0% implies the same MSE that you would get by simply always predicting the overall mean of the dataset."