Hacker News new | ask | show | jobs
by simonblanke 1932 days ago
I think Random Restart Hill Climbing is good if your objective function evaluates fast (simple functions). It does not get stuck in local optima and also does local search very well.

Bayesian Optimization is very good if your objective function takes some time (> 1 second) to evaluate. If you look at the gifs in the readme of Gradient-Free-Optimizers you will see how fast it finds promising solutions. It is almost scary how good it is (even compared to other smbo).

2 comments

Which algorithm would you recommend when the objective function is noisy (and nondeterministic)?

For example the objective function is the "score" of a particular stochastic simulation, which can be started with varied initial random seed, or the result of a real physical experiment, which is naturally stochastic (and expensive to evaluate).

There is a tradeoff between getting a very accurate estimation of the objective function + variance of a single point vs exploring other points. Is there a search algorithm that somehow manages this tradeoff automatically?

Note: In the past I've used Tree of Parzen Estimators (Kernel density estimators), wasting 3-4 evaluations per point, but I have a feeling it is sub-optimal. Is there an "optimal" algorithm, like the optimal algorithm for the multi-armed bandit problem[1] (which is similar)

[1] https://en.wikipedia.org/wiki/Multi-armed_bandit

Bayesian Optim is designed for that case specifically. It fits a surrogate model with uncertainty estimates and picks the next point with an understanding of that uncertainty. Look up the MEI acquisition function for more info.

Edit: BO does usually require some tuning for your use case. Its acquisition function sometimes samples replicates where there’s high noise, especially if the first sample looks particularly “good”. There’s usually a hyper parameter that can be set to favor exploration vs exploitation, I.e. to favor non-replicate samples. But I am not aware of an algo that can learn your preference along that axis.

> especially if the first sample looks particularly “good”.

You've precisely described the problem: the algorithm will get stuck on a point if the first sample looks good and the assumption of zero variance. Until it randomly hits a luckier sampler (but not necessarily better point).

Another related problem, is that the boundaries of the parameter space have a bad score (objective function), but very low variance (they're always bad), which confuses the search function into believing that the interior points also have a very low variance, which is incorrect.

If anyone knows of a library that handles those cases correctly, without providing user-defined priors for each dimensions, I'd be glad to hear

Correct me if I'm wrong, but it seems the bayesian_optimization.py optimizer in this library assumes that the sampled points are exact, ie their variance is zero. It doesn't seem to re-sample existing points.

This will cause the algorithm to "chase random noise", as morelandjs wrote below

I haven't looked at his source, but usually there's a white noise term in the covariance structure of the Gaussian process regressor that adds some statistical uncertainty at each evaluation point. So even when it evaluates a point of parameter space the GPR is still somewhat uncertain about the value of the optimization function at that point. So it should balance exploration versus exploitation taking that statistical uncertainty into account.
I would be very disappointed if that were the case.. no, it looks like it’s set up to capture variance. The BO algo wraps an “Expected Improvement Optimizer”:

https://github.com/SimonBlanke/Gradient-Free-Optimizers/blob...

Which selects new points based on both the model’s mean estimate and its variance. See around line 58

line 62: exp_imp[sigma == 0.0] = 0.0

I'm afraid it never samples points more than once, since it estimated already-sampled-points as points with variance zero, and no expected improvement.

IMHO that's wrong. Variance of a single sample should be infinite (classical statistics), or similar to the variance of nearby points (bayesian+model), or some pre-defined prior (not a great idea... I'd prefer some automatic method). But not zero.

Ah, good catch. So in the event the gpr predicts zero variance, the optimizer says EI is zero and thus won’t sample again. That may depend on the settings of the gpr.. if I’m not mistaken there are ways for gpr to model noise and not collapse to zero variance on every sampled point.

Anyway, I guess I stand by my original suggestion that BO is the best tool for gradient free optim with slow and noisy fevals, but to my knowledge nobody has built a way to dial in the hyper parameters automatically. And there are quite a few. Entire companies exist for this purpose, SigOpt comes to mind..

I'm wondering the same. I'd be concerned that Random Restart Hill climbing would essentially chase random noise.
You could be right. I must confess, that i have a (probably) very narrow understanding of typical optimization problems. Most of the objective functions i optimize have machine learning algorithms in it (to optimizer hyperparameters). Depending on the evaluation those can have low noise.

If you like you could provide other use cases and applications for optimization algorithms.

Thanks, that's useful. My impression of Bayesian parameter optimization using Gaussian processes is that it is quite good when the data has a more or less constant correlation length across the evaluation points as in your example.

When there are large correlation lengths in some regions of the dataset and small correlation lengths elsewhere, it often over (or under) shoots the curvature of the hypersurface.