Hacker News new | ask | show | jobs
by scottedwards 4778 days ago
Great suggestion. I've been amazed to find out that many coders and amateur "data scientists" don't realize that testing the assumptions is an important part of conducting statistical analyses. Part of this may be due to the recent emphasis on machine learning techniques, which tend to be assumption-free (often just assuming independence of cases in the sample).
2 comments

Part of this may be due to the recent emphasis on machine learning techniques, which tend to be assumption-free...

No statistical technique is assumption free, unless it is purely descriptive.

Some of them are free of explicit assumptions known by the practitioner, but that's not the same thing. In much the same way, my code is all bug-free.

    machine learning techniques, which tend to be assumption-free 
ML should be a rigorous exercise in Bayesian and classical/frequentist stats, computational methods, dataset integrity, visualization etc, if you've been thru the texts by Murphy or Bishop. It often happens that people a couple years out of their last stats class only retain that high R-squared, p-, t- and f-values are what they're looking for, and heteroskedasticity and sphericity are just big words.

My evidence that ML is a rigorous exercise: the free texts listed (Barber, Mackay and Smola's are excellent, ESL not as accessible)

http://metaoptimize.com/qa/questions/186/good-freely-availab...

Thanks, @gtani, great resource. Yeah didn't mean to imply that ML techniques are free of ANY assumptions, just that several of the popular ones like logistic regression don't have distributional assumptions. (actually, I really want to understand the VC Inequality at some point, as it seems to allow us to make conclusions about out of sample error rates without depending on distributional assumptions)