Hacker News new | ask | show | jobs
by simonblanke 1938 days ago
Gradient-Free-Optimizers is a lightweight optimization package that serves as a backend for Hyperactive: https://github.com/SimonBlanke/Hyperactive

Hyperactive can do parallel computing with multiprocessing or joblib, or a custom wrapper-function. Parallel computing can be done with all its optimization algorithms. You could even pass a gaussian process regressor like GpFlow to the Bayesian Optimizer (GFO and Hyperactive) to get GPU acceleration.

1 comments

Consider adding some affordance for parallel computing to Gradient-Free-Optimizers by allowing the user to provide a vectorized objective function instead of one that evaluates only a single point in the search space per function call. That leaves all the hard work of parallelization as an exercise for the user, and gives the user the flexibility to parallelize their objective function with whatever mechanism they wish.

I have previously used this approach in a project where the objective function contained a half-hour long simulation, which was the bottleneck that made estimating a gradient intractable. When the optimization algorithm gave a batch of several points in the search space to evaluate, our objective function could prepare and run several instances of the simulation in parallel, and return when the whole batch was complete. From this, it was easy for us to also distribute simulation runs across several machines, without needing any changes to the optimizers. We would not have been able to easily achieve this with an optimization framework that tried to directly manage parallelization, because the steps necessary to prepare the input files for the simulation software had to be done serially.

For that project we tried: DIRECT, several variants of Nelder-Mead, and an evolutionary strategy. In hindsight, the Nelder-Mead variants worked best; once we accumulated enough simulation results it became clear that our objective function was convex and pretty well-behaved in the region of interest. Nelder-Mead was also trivial to extend to trying several extra points per batch to ensure that each of our several workstations had something to work on. (We didn't have access to a large cluster, and Nelder-Mead wouldn't generalize well to a large degree of parallelization in that manner.)

Your parallel computing approach sounds intriguing! Could you provide an example script? I would like to look into this. If you like you could open an issue as a feature request and provide a code snipped there.
I don't have any code handy to share. The project in question was a decade ago and we started with optimization code written in MATLAB. The objective function then turned into a thin wrapper around a python2 script, because that could actually spawn multiple processes unlike the version of MATLAB we were working with. When we started distributing jobs to several machines, we used execnet: https://codespeak.net/execnet/ with hard-coded IP addresses and CPU core counts for each machine. So nothing pretty or particularly useful to share if I could dig it up from my archives.

But as far as illustrating how the optimization framework would need to work to support a vectorized objective function, you can take any existing sample objective function that's written to take N scalar arguments and update it to take N vector arguments, where the length of the vectors is the number of points in the batch to be evaluated. For simple numpy functions, there might not even need to be any changes to the code of the objective function.