Hacker News new | ask | show | jobs
by throwaway198846 152 days ago
I lately used these methods and BFGS worked better than CG for me.
1 comments

Absolutely plausible (BFGS is awesome), but this is situation dependent (no free lunch and all that). In the context of training neural networks, it gets even more complicated when one takes implicit regularisation coming from the optimizer into account. It's often worthwhile to try a SGD-type optimizer, BFGS, and a Newton variant to see which type works best for a particular problem.