|
|
|
|
|
by yobbo
1066 days ago
|
|
You could start reading on CMA-ES; which is something like a particle filter on the model parameters. So for 100 "particles", it means 100 resampled copies of the model, which are then evaluated to create something like a "synthetic" gradient which is used to update a distribution over the model parameters. But it doesn't solve the problem of local minima, and it will also need to use minibatches. |
|