|
|
|
|
|
by dxbydt
4924 days ago
|
|
Its just a stupid fraction.
say you have a dataset ie. sequence of (x,y) tuples. In OLS, you try to fit a line onto the dataset. So your manager wants to know how well the line fit your dataset. If it does a bang-up job, you say 100% aka rsquare of 1. If it does a shoddy job, you say 0% aka rsquare of 0. Hopefully your rsq is much closer to the 1 than to the 0. Here I just coded up a 10-liner for you:
https://gist.github.com/4333595 |
|
Correlation induces an inner product on the set of zero-mean random variables. The regression coefficient is precisely the projection coefficient <x,y>/<y,y> and R^2 is precisely the Cauchy-Schwarz ratio <x,y>^2 / <x,x><y,y> (i.e. the product of the two projection coefficients between x and y).
It is a theoretically natural measure of linear quality-of-fit. It has the added bonus of being equal to the ratio of modeled variance to total variance (variance being the square-norm of a random variable in the norm induced by the correlation inner product).
It's also very very cheap to compute. Though there are more practically useful measures of "predictive power", like mutual information, R^2 does an admirable job for an O(1)-space and O(num data)-time predictiveness metric.