When I tried this to choose xgboost hyperparameters it didn't seem to perform much better than random search while also adding another layer of hyper-hyper-parameters.
What kernels would you recommend trying initially? I’m still unclear if the Gaussian processes require normal distribution (e.g. would they work on log-log / binomial based functions).
I’ve wanted to apply the approach you mention a few times, but documentation seems to go from “Wiki” level to novel research articles. Are there any good introductory books / resources that aren’t beginner level? That scikit library looks handy!
I guess at its root the problem may just be how much compute is available to throw at the optimization. Alternatively there could be more efficient algos... I looked into but never fully tested this, it seemed promising:
https://news.ycombinator.com/item?id=16241659
In practice, I've found GPs to be great for getting actual insight into an unknown function, but much less useful as a black-box learner.