Hacker News new | ask | show | jobs
by p4wnc6 3830 days ago
I disagree. Most of the egregious stuff is in published statistics literature, particularly in econometrics, psychology, medicine, and biology, from researchers whose full-time job is to use statistics to solve applied problems ("domain statisticians" if you will).

Even if your definition of "statistician" only applied to Wasserman or Gelman types, I'd still say that the machine learning folks of the same level exhibit hugely more caution about the theoretical properties of their models (not a knock against Wasserman or Gelman, just a property of the rigor of e.g. PAC learning versus some ad hoc hierarchical model).

1 comments

I take the narrow view on "statistician". I agree that many if not most scientists are poorly trained in statistics even though all major journals try to throw a veneer of mathematics on their publications.

As for the comparison with ML, I think a large chunk of the ML community aims for (with good reason) evidence of predictive capacity rather than theoretical soundness. Not everyone. I'll grant that a good portion care deeply about theory. Look at the arguments between SVM folks and "Neural" Nets folks.

It comes down to a difference in focus. Statistics cares about causal inference. Machine Learning cares about prediction. Nothing wrong with either, but theiir techniques are sometimes ill-suited for the other purpose.

I agree with your distinction between groups who care about "causal inference" like the debates between Judea Pearl and Andrew Gelman on the role of toy problems in statistics, and groups who care more about "prediction engineering" (as long as we are careful to also admit that people in the ML prediction engineering camp care very, very much about the theoretical properties of their methods, especially in avoiding overfitting, because engineering predicition in a climate of overfitting is useless).

I would just add a big third category that probably encompasses the vast majority of people who "work in statistics" and that would be people who are not interested in causal inference nor in predictive efficacy but are interested in a much less rigorous idea of "explanatory modeling" -- and this group generally is very poor with statistical hygiene.