Hacker News new | ask | show | jobs
by ordinaryperson 2618 days ago
> The accuracy of any of the selection methods we use in education is very poor. It just is.

The SAT when combined with the high school GPA (HSGPA) has an adjusted correlation correlation coefficient of 0.56 with first-year GPA, meaning the combined measurement accurately predicts how a potential college applicant will perform in their first year of college 56% of the time. [1]

That's actually pretty good, what other proposed metrics can say their signals match outcomes with 56% validity? How much you liked their essay?

Lower SAT scores have about 63% retention rate for first-year students whereas high SAT scores have about a 95% retention rate [2]. That is, high schoolers with poor SATs drop out of college about 40% of the time in their first year.

Standardized tests have many problems -- obviously -- but no one has developed a less unfair system.

When colleges abandon standardized tests what else are they relying on? Random signals made up by admissions officers? That's worse than job interviewing.

I have no problem criticizing standardized testing, but I feel everyone who does should be obligated to propose a better alternative method with a higher validity rate than 56%.

[1] https://files.eric.ed.gov/fulltext/ED563202.pdf

[2] https://files.eric.ed.gov/fulltext/ED563471.pdf

3 comments

> The SAT when combined with the high school GPA (HSGPA) has an adjusted correlation correlation coefficient of 0.56 with first-year GPA, meaning the combined measurement accurately predicts how a potential college applicant will perform in their first year of college 56% of the time.

Thaaat's not what "correlation" means.

I'm summarizing for a general audience. I could say, r is " the strength of the linear relationship between two variables on a graph" but I'm not sure that helps the average person understand the connection.

If you have a better description, it's more helpful to chime in with that instead of "You're wrong!"

A better summary would be that those two quantities explain about half of the variation, not that they predict accurately half the time.

If you took a random sample of cases, half of them wouldn’t exhibit a direct relationship b/w SAT and first year GPA and half nothing (unless the data is _super_ weird). Instead, SAT would be instructive-ish in predicting first year GPA for all those cases.

Explaining half the variation, and the other half?

The point was to draw a connection for the general audience, not present the most scientifically accurate description of a relationship between two variables -- that's what the links to the research are for.

It's good to communicate for a general audience, but your presentation misleads rather than simplifies.

> meaning the combined measurement accurately predicts how a potential college applicant will perform in their first year of college 56% of the time.

"accurately predicts...56% of the time" implies that half of predictions are 'accurate', which most readers would interpret as 'correct' i.e. knowing SAT + HSGPA allows you to state FYGPA _exactly_ for about half of cases. That's not what the research you cited says. Rather, the square of the multiple correlation R (which is exactly R^2, the coefficient of determination) indicates how much of the variance in the output variable is explained by the input variables. That quantity _must_ be communicated in terms of the strength of the relationship, not accuracy for a given or share of cases as it doesn't tell us anything about a given case. One could say it tells us about 30% (0.56^2, correction from my statement above) of the information we'd need to know to perfectly predict the outcome, or that the relationship is better than random, but doesn't predict perfectly, or ...

Additionally, table 5 of the link you cited indicates the adjust correlation coefficient b/w FYGPA and the combination of HSGPA and SAT is 0.62. None of the numbers in that table are 0.56, so I'm not sure where you pulled that exact number from. I've used 0.56/56% above to be clear which quantity I'm referring to.

Uh...this is exactly what I mean. Your description is 100% scientifically accurate but probably way beyond the average reader.

Again, if you can simplify this in a way more accurate than I have, great, be my guest-- I look forward to reading it.

R^2, coefficient of determination, output variable variance, etc etc -- most readers aren't going to go that deep in the math. For those who do, like you, the links to the actual research is provided.

But so far all I see are data scientists complaining about how my description is not 100% statistically accurate without providing any alternative explanation that doesn't devolve into variance of output variables.

Again, be my guest to show me I'm wrong, but what you wrote above is not something that would be easy to understand for the general audience, IMHO.

That's not summarizing. "It's the strength of the relationship" is summarizing. "The combined measurement accurately predicts how a potential college applicant will perform in their first year of college 56% of the time" is just wrong. See Anscombe's quartet for a great example of why it's just plain wrong.

https://en.wikipedia.org/wiki/Anscombe%27s_quartet

And your completely scientifically accurate but easy for the lay reader to understand description in a few simple words is...?
Isn't that the example I used?

"It's the strength of the relationship"

I happen to like:

"It's how perfectly you can fit a straight line to them."

You can be mathematically accurate without being mathematically precise. Better imprecise but correct than incorrect but precise.

If you're trying to give a quantitative lay picture of what exactly 0.56 linear correlation means, you need to still be quantitatively right, while the above are quantitative. Pictures and examples can help. "For perspective, 0.56 is about the correlation between <example> and <example>"

I'm sorry, I'm not following your description.

Saying there is a quantitative strength to a relationship is, to a regular person, meaningless. Am I .56 in love with my wife?

Can I fit in a straight line to her?

These are not good descriptions. Of course HN is full of data scientists who wildly object to oversimplifying statistical relationships -- luckily you are here to give the detailed mathematical context. But these are not simplified descriptions for a general audience.

>The SAT when combined with the high school GPA (HSGPA) has an adjusted correlation correlation coefficient of 0.56 with first-year GPA, meaning the combined measurement accurately predicts how a potential college applicant will perform in their first year of college 56% of the time.

This is a totally incorrect interpretation of what correlation is.

Again, I'm summarizing for a general audience. If you have a better way to describe it that doesn't devolve into polynomials and linear relationships between variables on a graphs it's more helpful to do so than just say, "You're totally incorrect!"
But a coin flip on a large number of people would trend toward 50% predictions over time.
And a correlation of 0%
Right, but I'm trying to get at 56% isn't great cause random is 50, and there's no clarify on correlation of the measure that gets to 56.
Correlation is not probability. You can't compare them at all. Flipping a coin for each student would produce a correlation of 0, far lower than the correlation of 0.56 cited above. Have a look at some plots of data [1] with different correlation coefficients to see how dramatic it can be. Note the difference between r = 0.00 and r = 0.60. That's about what we're dealing with here.

[1] http://www.bwgriffin.com/gsu/courses/edur7130/images/twelve_...

Which is worse than 56%
Yeah, but if randomness gets you 50, 56 doesn't feel that useful.