|
I have had the exact opposite experience with machine learning and statistics. In my experience, those who come from the 'statistics' side tend to use constructs, like null hypothesis significance testing, which are not consistent even from a theoretical point of view. And further, when they use them, they do awful things like p hacking, or using a direct comparison of t-stats as a model selection criterion, which are further rife with theoretical problems, not to mention lots of statistical biases and so forth. I find the machine learning approach is far more humble. It starts out by saying that I, as a domain expert or a statistician, probably don't know any better than a lay person what is going to work for prediction or how to best attribute efficacy for explanation. Instead of coming at the problem from a position of hubris, that me and my stats background know what to do, I will instead try to arrive at an algorithmic solution that has provable inference properties, and then allow it to work and commit to it. Either side can lead to failings if you just try to throw an off-the-shelf method at a problem without thinking, but there's a difference between criticizing the naivety with which a given practitioner uses the method versus criticizing the method itself. When we look at the methods themselves I see much more care, humility, and carefulness to avoid statistical fallacies in the machine learning world. I see a lot of sloppy hacks and from-first-principles-invalid (like NHST) approaches in the 'statistics' side. And even when we consider how practioners use them, both sides are pretty much equally as guilty of trying to just throw methods at a problem like a black box. Machine learning is no more of a black box than a garbage-can regression from which t-stats will be used for model selection. However, all of the notorious misuses of p-values and conflation over policy questions (questions for which a conditional posterior is necessarily required, but for which likelihood functions are substituted as a proxy for the posterior) seem very uniquely problematic for only the 'statistics' side. Three papers that I recommend for this sort of discussion are: [1] "Bayesian estimation supersedes the t-test" by Kruschke, http://www.indiana.edu/~kruschke/BEST/BEST.pdf [2] "Statistical Modeling: The Two Cultures" by Breiman, https://projecteuclid.org/euclid.ss/1009213726 [3] "Let's put the garbage-can regressions and garbage-can probits where they belong" by Achen, http://www.columbia.edu/~gjw10/achen04.pdf |
I do not know enough about statistics to make a (negative) quality statement about it. I know a bit more about machine learning though, and there I also see things like: Picking the most favorable cross-validation evaluation metric, comparing to "state-of-the-art" while ignoring the real SotA, generating your own data sets instead of using real-life data, improving performance by "reverse engineering" the data sets, reporting only on problems where your algo works, and other such tricks. I believe you when you say much the same is happening for statisticians.
Maybe it was my choice of words (careful, sober). I think its fair to say that (especially applied) machine learners care more about the result, and less about how they got to that result. Cowboys, in the most positive sense of the word. I retraced where I got the cliff analogy. It's from Caruana in his video "Intelligible Machine Learning Models for Health Care" https://vimeo.com/125940125 @37:30.
"We are going too far. I think that our models are a little more complicated and higher variance than they should be. And what we really want to do is to be somewhere in the middle. We want this guy to stop and we want that statistician to get there, together we will find an optimal point, but we are not there yet."