| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dccsillag 576 days ago
	Bayesian Neural Networks just seem like a failed approach, unfortunately. For one, Bayesian inference and UQ fundamentally depends on the choice of the prior, but this is rarely discussed in the Bayesian NN literature and practice, and is further compounded by how fundamentally hard to interpret and choose these priors are (what is the intuition behind a NN's parameters?). Add to that the fact that the Bayesian inference is very much approximate, and you should see the trouble. If you want UQ, 'frequentist nonparametric' approaches like Conformal Prediction and Calibration/Multi-Calibration methods seem to work quite well (especilly when combined with the standard ML machinery of taking a log-likelihood as your loss), and do not suffer from any of the issues above while also giving you formal guarantees of correctness. They are a strict improvement over Bayesian NNs, IMO.

7 comments

waldrews 576 days ago

The Conformal Prediction advocates (especially a certain prominent Twitter account) tend to rehash old frequentist-vs-bayesian arguments with more heated rhetoric than strictly necessary. That fight has been going on for almost a century now. Bayesian counterargument (in caricature form) would be that MLE frequentists just choose an arbitrary (flat) prior, and penalty hyperparameters (common in NN) are a de facto prior. The formal guarantees only have bite in the asymptotic setting or require convoluted statements about probabilities over repeated experiments; and asymptotically, the choice of prior doesn't matter anyway.

(I'm a moderate that uses both approaches, seeing them as part of a general hierarchical modeling method, which means I get mocked by either side for lack of purity).

Bayesians are losing ground at the moment because their computational methods haven't been advanced as fast by the GPU revolution for reasons having to do with difficulty in parallelization, but there's serious practical work (especially using JAX) to catch up, and the whole normalizing flow literature might just get us past the limitations of MCMC for hard problems.

But having said that, Conformal Prediction works as advertised for UQ as a wrapper on any point estimating model. If you've got the data for it - and in the ML setting you do - and you don't care about things like missing data imputation, error in inputs, non-iid spatio-temporal and hierarchical structures, mixtures of models, evidence decay, unbalanced data where small-data islands coexist big data - all the complicated situations where Bayesian methods just automatically work and other methods require elaborate workarounds, yup, use Conformal Prediction.

Calibration is also a pretty magical way to improve just about any estimator. It's cheap to do and it works (although hard to guarantee anything with that in the general case...)

And don't forget quantile regression penalties! Awkward to apply in the NN setting, but an easy and effective way to do UQ in XGBoost world.