| HN Mirror

Yes there is value, but there is also some cost, and this value can be grabbed naturally by using an optimizer that can make use of multiple direction.

Optimizer like O-LBFGS maintain internally a low-dimension quadratic approximation of the cost function, that is updated with each noisy gradient evaluation, and then the optimizer consist of taking a step in the direction that would minimize that approximation.

In the same line of thought, one can ask whether a line-search like in usual optimization algorithms is worth it or not.

The memory cost is usually storing the parameters multiple times and an increase computing cost per iteration. Whether the tradeoff is worth it or not is usually problem dependent. Using a quadratic approximation is more costly than using a linear interpolation but it often pays off when encountering saddle points.

There are also technique like Polyak averaging (parameter tracking, Teacher-Student...) that have some success in Reinforcement Learning, where they help to regularize the energy landscape.

OpenAI "Evolution Strategy" used a low number of random directions, and they found that they needed roughly ~100 (= implicit dimension of the problem) directions to have equivalent performance than backprop, but they were computing the derivative with finite difference instead of forward differentiation.