Hacker News new | ask | show | jobs
by datastoat 2384 days ago
A more charitable take on machine learning: you decide that your criterion is predictive accuracy, and you evaluate it on a holdout set (or you cross-validate).

The idea of evaluation on a holdout set is actually frequentist: it's equivalent to "I really want my model to work well on the true distribution, but that's unknown, so I shall approximate it by the empirical distribution of the data." The empirical distribution is the maximum likelihood fit to the data, if you allow yourself the entire space of distributions.

Compare to how Bayesians do model selection... I've seen several versions:

-- "I have a prior on the set of models, and I compute the model evidence using Bayesian principles, and thereby update my beliefs about the set of models." (This is a clean principled approach. Shame no one does it!)

-- "I compute model evidence using Bayesian principles. The model with the largest evidence is my favoured model." (This is nonsense.)

-- "I compute model evidence. I then use gradient descent to find the hyperparameter values that maximize evidence." This is what is done by all sorts of "Bayesian" frameworks, such as the Gaussian Process models in sklearn. (This is classic frequentism, but for some strange reason Bayesians claim it as their own.)

I certainly wouldn't argue that "predictive accuracy" is the be-all and end-all of modelling -- but it is a nice clean principled approach to model selection. I have honestly never seen a Bayesian who takes a principled approach to model selection.

1 comments

> A more charitable take on machine learning: you decide that your criterion is predictive accuracy, and you evaluate it on a holdout set (or you cross-validate).

I'm doing a PhD in machine learning, so I quite realize. But it's Bayesian machine learning!