Hacker News new | ask | show | jobs
by wenc 2178 days ago
Interesting. There's been decades of research on Derivative-Free Optimization (DFO) and stochastic/evolutionary algorithms (most of which are derivative-free). They're used in practical applications, but have been hard to reliably perf benchmark because solution paths are so dependent on initial guess and random chance.

This one focuses on maximizing sample efficiency. That's an interesting (and important) metric to benchmark, especially for functions that are computationally expensive to evaluate, like full-on simulations. Sounds like the algorithm would need to be able to efficiently come up with an accurate surrogate model for the expensive function -- which is hard to do in the general case, but if something is known about the underlying function, some specialization is possible.

1 comments

Solution To benchmark in general is to run several optimization runs with a random innitialization. Though this has its limitations.

A great optimizer is RBFopt, (python based, free, fast accurate) which is able to do very well and creates a surrogate model while optimizing. My go to optimizer at this point for engineering projects. If anyone knows a better piece of software let me know.

Fascinating! I skimmed the details in the paper here [1] and my impression is that it looks pretty solid. It uses Gutmann's RBFs as surrogate structures. It also benchmarks pretty well against stochastic/metaheuristic algorithms [2].

However most examples in the paper were somewhat small and the problem type is restricted to MINLPs with box constraints (which is still tremendously useful, especially for hyperparameter optimization in simulations).

Just curious, what's the largest problem size you've managed to solve? (no. of constraints, binaries/continuous vars)

[1] http://www.optimization-online.org/DB_FILE/2014/09/4538.pdf

[2] https://www.sciencedirect.com/science/article/pii/S228843001...

Hi paper, is in the works, we designed a bilevel optimization with rbfopt on the upper level and a linear problem on the lower level. Number of constraints are in the 1000s for the linear problem. The inputs to the lower level are 17, and a single output object. We find 400 evaluations of the lower level model for the most important object we will converge to a global minimum. I test ga’s swarm couple of other they didn’t even see the solution found by rbfopt in 800 evaluations.
This blog post lists a bunch of gradient-free optimization packages, some genetic and some Bayesian: https://sigopt.com/blog/comparison-bayesian-packages-hyperpa.... Nothing from the mathematical programming community in here, so definitely other options too (depending on what kind of problems you are trying to address).

Personally, I like Nevergrad (https://facebookresearch.github.io/nevergrad/) a lot for general purpose optimization problems -- I think it is very well implemented and has a variety of tools available. I also think the documentation is appropriately honest about what is and is not known for how these algorithms work in different circumstances.

If you want something Bayesian (sample-efficient) which is very lightweight, I like PySOT (https://pysot.readthedocs.io/en/latest/). Part of why I like it is that it's written by friends of mine, but I also legitimately like its performance across a decent set of problems.

If you want something Bayesian which has corporate support (so that you know it's updated/maintained), I would recommend Botorch/Ax from Facebook (https://botorch.org/docs/botorch_and_ax). They have done a lot of research for it (a recent preprint is here https://arxiv.org/pdf/2006.05078.pdf) and have put together a very solid implementation including considerations for running online optimization problems. I think the documentation is a bit weak, but the software and research is outstanding.

Another corporate-supported option is Optuna (https://optuna.org/) from Preferred Networks. I also know some of the people working on this and I think it is a good implementation of the kernel density estimation strategy for statistical modeling -- preferring lower computational cost and consistent timing over performance. I had difficulties running it in server mode while I was testing, but if you're running locally that will not be a problem.

As is always the case with optimization strategies, there is no one answer. Different tools perform well in different circumstances. There can be bad tools, but, likely, there will never be a best tool (in my estimation).

Thank you! Lots of resources to explore!