Hacker News new | ask | show | jobs
by krstck 3890 days ago
> They're basically arguing that you're better off going to a school in the middle of nowhere because "hey, for being in such a crappy location, you did pretty well!".

Well, no, not exactly. It's a subtle distinction, but what it's actually ranking is how well that school exceeds expectations, not best outcomes. This is not necessarily a list that will give a student the best school to go to, but rather (what it says on the tin) a scorecard for how well those schools are doing, given their resources.

1 comments

That's my point -- who decides what expectations are? Their results are incredibly dependent on the model specification. I imagine if they changed which indicators they used, the results would vary widely.

Here's another way to see my concern. Suppose you had a classifier that achieves 1.0 R^2; then since it perfectly predicts each school's expected value, it'll assign each school a score of 0. I'm pretty suspicious of an approach where the results get worse with better predictive power.

Even if you want to do "exceeds expectations", I think you shouldn't include variables that are school specific, only variables that are student specific. In other words, for my expected outcomes, which school is best?

> Here's another way to see my concern. Suppose you had a classifier that achieves 1.0 R^2; then since it perfectly predicts each school's expected value, it'll assign each school a score of 0. I'm pretty suspicious of an approach where the results get worse with better predictive power.

This depends on your view of the importance of undergraduate education, and what worse is. From my point of view, undergraduate education is an institutional obligation used to fund or justify faculty's personal objectives: research.

The reason that the model counts location is simple: universities tend to place candidates locally. I'm pretty sure the recruiters attending fairs at Stanford and Berkeley have higher starting wages than the ones at University of Kansas, and that a lot of that difference is simply regional cost of living. If you don't factor that in, you risk a bad school in an expensive place ranking higher than a good school in a cheap place.

> Suppose you had a classifier that achieves 1.0 R^2; then since it perfectly predicts each school's expected value, it'll assign each school a score of 0. I'm pretty suspicious of an approach where the results get worse with better predictive power.

If I'm understanding correctly, that result would indicate a world where the college you attend has no effect on your earning power. ie. choose any college you want, because you'll earn the same amount regardless of which one you choose.

This would only apply to colleges that people in your demographic group actually attended though. If the dataset doesn't contain any information about people like you who went to Harvard, then maybe Harvard would indeed increase your earning potential if there was a way for you to actually go there.

I'm not saying that each college you attend has no effect on earning power. It's just that I can perfectly predict the effect of each college on your earning power. Does that make sense? If I have an oracle that tells you

"if you got to Harvard, you will make $80,000, if you go to MIT, you will make $86,000",

and the oracle is exactly correct, then under this model, The Economist assigns every college a score of 0.

I think you are missing the key ingredient in the analysis.

The Economist is attempting to build such an oracle via statistical regression. HOWEVER, the Oracle is intentionally limited in input to a specific list of things: SAT scores, sex ratio, race breakdown, size, public or private, earning power in the city where it is located, etc.

The things that are omitted constitute the actual value the University brings to the table: quality of teachers, instruction, organizations on campus, etc. (1)

So however far off the model is for two given Universities must be explained by all the missing inputs, i.e. largely how "good" the University is.

If the Oracle was able to perfectly predict your earning power given that limited set of inputs, then it would basically mean that a University is completely defined by SAT scores of students, sex ratio, race, etc. and there's absolutely no value they add or subtract beyond that. That was be a very, very interesting result. But you can see why it's unlikely.

Hopefully this makes sense?

(1) Of course it's possible that there are factors like "how many trees on campus" or "how many vowels are in the name" which might also affect earnings. But we can probably agree that it's less likely to be important than the aforementioned ones ("quality of instruction", etc.).

> and the oracle is exactly correct, then under this model, The Economist assigns every college a score of 0.

If what you are saying is true, then I agree that the ratings are nonsensical. But I don't think you are correct. Their methodology consists of comparing two numbers:

1. the observed earnings

2. the estimated earnings if the same students had studied elsewhere (presumably some average)

So if we had your oracle, the numbers 1 and 2 would be different for the two colleges, not the same.