Hacker News new | ask | show | jobs
by heinrichhartman 309 days ago
Why would you learn Gaussian Processes today? Is there any application where they are still leading and have not been superseeded by Deep NNets?
6 comments

I would argue there are more applications overall where Gaussian processes are superior, as most scientific applications have smaller data sets. Not everything has enough data to take advantage of feature learning in NNs. They are generally reliable, interpretable, and provide excellent uncertainty estimates for free. They can be made to be multiscale, achieving higher precisions as a function approximator than most other methods. Plus, they can exhibit reversion to the prior when you need that.

Another example where it is used is for emulating outputs of an agent-based model for sensitivity analyses.

Basically they're incredibly useful for any situation where you have "medium" data where you don't have enough data to properly train a NN (which are very data hungry in practice) but enough data that you're not really exploiting all the information using a more traditional approach.

GPs essentially allow you to get a lot of the power of a NN while also being able to encode a bunch of domain knowledge you have (which is necessary when you don't have enough data for the model to effectively learn that domain knowledge). On top of that, you get variance estimates which are very important for things like forecasting.

The only real draw back to GPs is that they absolutely do not fit into the "fit/predict" paradigm. Properly building a scalable GP takes a more deeper understanding of the model than most cases. The mathematical foundations required to really understand what's happening when you train a sparse GP greatly exceed what is required to understand a NN, and on top of that there is a fair amount of practical insight into kernel development that is required as well. But the payoff is fantastic.

It's worth recognizing that, once you realize that "attention" is really just kernel smoothing, transformers are essentially learning sophisticated stacked kernels, so ultimately share a lot in common with GPs.

AFAIK state of the art is still a mix of new DNN and old school techniques. Things like parameter efficiency, data efficiency, runtime performance, and understandability would factor into the decision making process.
Bayesian optimization of, say, hyperparameters is the canonical modern usage in my view, and there are other similar optimization problems where it's the preferred approach.
To reduce the risk of being a lemming. It is in everyone's interests for some people not to follow the herd / join the plague of locusts.
you can combine deep NNets with GPs, e.g. here https://arxiv.org/abs/1511.02222

So it isn't a matter of which is better. If you ever need to imbue your deep nets with good confidence estimates, it is definitely worth checking out.