Hacker News new | ask | show | jobs
by siddboots 2994 days ago
A tool that I've found myself reaching for more and more often is Gaussian Process Regression [1] [2]

* It allows you to model essentially arbitrary functions. The main model assumption is your choice of kernel, which defines the local correlation between nearby points.

* You can draw samples from the distribution of all possible functions that fit your data.

* You can quantify which regions of the function you have more or less certainty about.

* Imagine this situation: you want to discover the functional relationship between the inputs and outputs of a long-running process. You can test any input you want, but it's not practical to exhaustively grid-search the input space. A Gaussian Process model can tell you which inputs to test next so as to gain the most information, which makes it perfect for optimising complex simulations. Used in this way, it's one means of implementing "Bayesian Optimisation" [3]

[1] https://en.wikipedia.org/wiki/Gaussian_process

[2] http://scikit-learn.org/stable/modules/generated/sklearn.gau...

[3] https://en.wikipedia.org/wiki/Bayesian_optimization

1 comments

When I tried this to choose xgboost hyperparameters it didn't seem to perform much better than random search while also adding another layer of hyper-hyper-parameters.
Yeah. The hyper parameter story that comes with Gaussian processeses is a big drawback. The choice of kernel has a massive impact.

In practice, I've found GPs to be great for getting actual insight into an unknown function, but much less useful as a black-box learner.

What kernels would you recommend trying initially? I’m still unclear if the Gaussian processes require normal distribution (e.g. would they work on log-log / binomial based functions).

I’ve wanted to apply the approach you mention a few times, but documentation seems to go from “Wiki” level to novel research articles. Are there any good introductory books / resources that aren’t beginner level? That scikit library looks handy!

Gaussianprocess.org
I guess at its root the problem may just be how much compute is available to throw at the optimization. Alternatively there could be more efficient algos... I looked into but never fully tested this, it seemed promising: https://news.ycombinator.com/item?id=16241659