|
It's good to communicate for a general audience, but your presentation misleads rather than simplifies. > meaning the combined measurement accurately predicts how a potential college applicant will perform in their first year of college 56% of the time. "accurately predicts...56% of the time" implies that half of predictions are 'accurate', which most readers would interpret as 'correct' i.e. knowing SAT + HSGPA allows you to state FYGPA _exactly_ for about half of cases. That's not what the research you cited says. Rather, the square of the multiple correlation R (which is exactly R^2, the coefficient of determination) indicates how much of the variance in the output variable is explained by the input variables. That quantity _must_ be communicated in terms of the strength of the relationship, not accuracy for a given or share of cases as it doesn't tell us anything about a given case. One could say it tells us about 30% (0.56^2, correction from my statement above) of the information we'd need to know to perfectly predict the outcome, or that the relationship is better than random, but doesn't predict perfectly, or ... Additionally, table 5 of the link you cited indicates the adjust correlation coefficient b/w FYGPA and the combination of HSGPA and SAT is 0.62. None of the numbers in that table are 0.56, so I'm not sure where you pulled that exact number from. I've used 0.56/56% above to be clear which quantity I'm referring to. |
Again, if you can simplify this in a way more accurate than I have, great, be my guest-- I look forward to reading it.
R^2, coefficient of determination, output variable variance, etc etc -- most readers aren't going to go that deep in the math. For those who do, like you, the links to the actual research is provided.
But so far all I see are data scientists complaining about how my description is not 100% statistically accurate without providing any alternative explanation that doesn't devolve into variance of output variables.
Again, be my guest to show me I'm wrong, but what you wrote above is not something that would be easy to understand for the general audience, IMHO.