Hacker News new | ask | show | jobs
by kristjansson 2617 days ago
The concerning (mis)interpretation of your statement is what I said on the third line above:

> "accurately predicts...56% of the time" implies that half of predictions are 'accurate', which most readers would interpret as 'correct' i.e. knowing SAT + HSGPA allows you to state FYGPA _exactly_ for about half of cases.

This interpretation is easy to arrive at, and clearly does not correspond to a reasonable understanding of the source, even for a general audience.

I provide two suggestions above:

> One could say it tells us about 30% of the information we'd need to know to perfectly predict the outcome, or that the relationship is better than random, but doesn't predict perfectly

1 comments

OK, but that only makes it more confusing. You say it tells us about 30% of the information we'd need to know...which makes it sound (to the lay person) like there's no connection because 70% of the information is elsewhere!

I appreciate your commitment to academic rigor but sometimes oversimplifying things, even at the cost of mathematical accuracy, is enough for a general audience who aren't going to compute variance of output variables.

This isn't about academic or mathematical rigor - this is about responsible communication of statistics.

You're right that a general audience isn't going to look at the source, nor think about variance of output variables. Therefore, it's the responsibility of us as communicators of statistics to relate the conclusions that can be drawn from the data in a way that first and foremost is not wrong or misleading, and secondly captures the concept as accurately as possible for the audience.

The first principle is the overriding obligation. Your simplification can capture as little of the information and conclusions supported by the data as you want, but it cannot imply or state conclusions that are not supported.

You're getting this response to your statement, from me and others, because your interpretation of the source can easily be read as drastically overstating the character (and strength) of the relationship supported by the data - even if that's not what you intended.

To you this is the major responsibility, but not to everyone.

I'm getting this response to my statement, from you and many other data scientists, who can't accept oversimplifying math, but all of whom have failed to produce a general description in plain English.

In some ways it's like the test-- everyone hates it but no one has a better alternative. You've complained maybe 7 or 8 times in this thread about how scientifically inaccurate my general summary is but have not produced a description that a regular person could understand in 5 words or less, with no technical jargon.

‘FYGPA is ...

somewhat associated with

partially explained by the combination of

... HSGPA And SAT’

You think that's a good description?

"Something is partially explained by a combination of numbers"?

That says nothing semantic of value. Better to exaggerate the causal relationship and give a sense of meaning than offer meaningless generics like that, because at least the general reader intuits a sense of the universe of the relationship. The above implies nothing.