Hacker News new | ask | show | jobs
by aorist 1264 days ago
> It is crucial for data scientists to carefully consider the quality and representativeness of the data they are working with

To bring this fully to the domain of epistemology, you could go further: why are certain markers of quality or measures of representativeness valid? If we develop a measure and it gives us a particular answer, can we tell whether the data or the measure is at fault.[1]

> [statistical significance] refers to the likelihood that a result or relationship observed in a sample is not simply due to chance, but rather reflects a genuine trend or pattern in the population.

This is in general not correct — statistical significance is about the probability of an observation assuming that the null hypothesis is true.[1] There is a narrow context in which you could interpret a p-value as being about the probability that there is a true effect: if you're doing Bayesian inference with a flat prior, but that is itself a very strong claim to make.

[0]: https://plato.stanford.edu/entries/measurement-science/#TheL... [1]: https://en.wikipedia.org/wiki/Misuse_of_p-values#Clarificati...