Hacker News new | ask | show | jobs
by esafak 1069 days ago
You have a gradient so use it instead of faffing about. As another user said, the optima are all the same since the models are wildly over-parameterized.
1 comments

This is tersely stated, but it's wise. In general, following the gradient (if you have it) is a very, very good idea.