| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bmc7505 2776 days ago
	Would be interesting to explore whether evaluating ∇²f(x) for higher-order SGD methods directly is now feasible for smaller DNNs and whether this leads to faster convergence during training. Most methods like Newton or Gauss-newton were thought intractable for DNNs. Also curious if the angle descent prescribed by quasi-Newton methods is empirically closer to the true 2nd order gradient and whether these approximations are useful in wall-clock time.

1 comments

jkam 2775 days ago

What kind of work are you referring to when you say higher-order SGD may _now_ be feasible for deep learning? I only find results that try to approximate second order information.

link

bmc7505 2767 days ago

Not sure what you mean. The paper above claims 1000x speedups for computing second-order derivatives. Have not tested their claims, but was speculating that such an improvement, if true, would make computing hessians for small networks fesiable. This is what I am referring to.

link