| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by braindongle 2215 days ago

Yes, this article should parade the p-value to make its point and for some reason it doesn't.

On p-values and linear regression in general, though: when you're new to inferential statistics applied to really complex data, such as anything that relates to human behavior, you go through this "clearly that's not a linear relationship" phase. But that's not really the point. You can choose any sort of function you want to maximize r, the options are endless [0]. But linear regression has a distinct advantage in that you can interpret the model coefficients as meaningful numbers. You can say things like "for every 5% increase in the proportion of binge drinkers in your state you can expect an X% increase in the proportion of the population that will get Covid" ...if the model satisfies some significance parameter threshold, like p<0.05, and, you know, correlation equals causation. Everyone knows that. Anyway, with your great 4th order polynomial, all you can say is "see, it fits!"

About significance thresholds. Yes, they are totally arbitrary, another realization in one's journey with frequentist statistics that is quite deflating. Still, we need a rule of thumb so we use things like p<0.05 and have a bunch of fancy ways to account for things like multiple comparisons, which increase the likelihood that some spurious correlation would have appeared to be significant without such adjustments.

This stuff is all super-useful when used appropriately. That's why, when you need to create a model (outside the Bayesian/ML anything goes world) and you need to get it right, the first thing you do is reach out to your trusty PhD statistician friend. At least, that's what I do. They spend countless hours to get to a place where they can say "in this situation, I would suggest..." I'm glad some people are into it that much.

[0] https://lmfit.github.io/lmfit-py/builtin_models.html