| HN Mirror

That seems more or less what I said in the comment you replied to: the prediction of individual votes remains quite bad even if the state is taken into account. That's why the relative reduction in MSE is low. That's why the R² is low. I don't think there is any paradox.

I was replying to someone who claimed that "R2 is not the correct measure to use. This article is a perfect example of the principle that simply doing math and getting results is not necessarily meaningful." I've not seen any comment from anyone getting "different results" with a different measure.

Edit: You used var(...) which includes a factor N/N-1 and doesn't give exactly the total sum of squares.

The example dataframe contains 40 observations (20 per state) and you get higher variance estimate for the subsamples than for the aggregate sample but if you put toghether a few copies of the data (for example doing "data <- rbind(data, data, data, data, data)") even the adjusted (unbiased) estimator of the variance is lower for the states.

You can calculate the "exact" values yourself doing (x-mean(x))^2 or undoing the adjustment:

  > var(data$pref)*39/40
  [1] 0.25
  > var(data[data$state==0, "pref"])*19/20
  [1] 0.2475
  > var(data[data$state==1, "pref"])*19/20
  [1] 0.2475

> when the variances inside any state are bigger that the total variance

They are not. But you're right in that a small difference shows that dividing the population in groups is of little value for prediction and that's why the R^2 value is small.