Hacker News new | ask | show | jobs
by haberman 3890 days ago
> Suppose you had a classifier that achieves 1.0 R^2; then since it perfectly predicts each school's expected value, it'll assign each school a score of 0. I'm pretty suspicious of an approach where the results get worse with better predictive power.

If I'm understanding correctly, that result would indicate a world where the college you attend has no effect on your earning power. ie. choose any college you want, because you'll earn the same amount regardless of which one you choose.

This would only apply to colleges that people in your demographic group actually attended though. If the dataset doesn't contain any information about people like you who went to Harvard, then maybe Harvard would indeed increase your earning potential if there was a way for you to actually go there.

1 comments

I'm not saying that each college you attend has no effect on earning power. It's just that I can perfectly predict the effect of each college on your earning power. Does that make sense? If I have an oracle that tells you

"if you got to Harvard, you will make $80,000, if you go to MIT, you will make $86,000",

and the oracle is exactly correct, then under this model, The Economist assigns every college a score of 0.

I think you are missing the key ingredient in the analysis.

The Economist is attempting to build such an oracle via statistical regression. HOWEVER, the Oracle is intentionally limited in input to a specific list of things: SAT scores, sex ratio, race breakdown, size, public or private, earning power in the city where it is located, etc.

The things that are omitted constitute the actual value the University brings to the table: quality of teachers, instruction, organizations on campus, etc. (1)

So however far off the model is for two given Universities must be explained by all the missing inputs, i.e. largely how "good" the University is.

If the Oracle was able to perfectly predict your earning power given that limited set of inputs, then it would basically mean that a University is completely defined by SAT scores of students, sex ratio, race, etc. and there's absolutely no value they add or subtract beyond that. That was be a very, very interesting result. But you can see why it's unlikely.

Hopefully this makes sense?

(1) Of course it's possible that there are factors like "how many trees on campus" or "how many vowels are in the name" which might also affect earnings. But we can probably agree that it's less likely to be important than the aforementioned ones ("quality of instruction", etc.).

> and the oracle is exactly correct, then under this model, The Economist assigns every college a score of 0.

If what you are saying is true, then I agree that the ratings are nonsensical. But I don't think you are correct. Their methodology consists of comparing two numbers:

1. the observed earnings

2. the estimated earnings if the same students had studied elsewhere (presumably some average)

So if we had your oracle, the numbers 1 and 2 would be different for the two colleges, not the same.