Hacker News new | ask | show | jobs
by jkam 2775 days ago
What kind of work are you referring to when you say higher-order SGD may _now_ be feasible for deep learning? I only find results that try to approximate second order information.
1 comments

Not sure what you mean. The paper above claims 1000x speedups for computing second-order derivatives. Have not tested their claims, but was speculating that such an improvement, if true, would make computing hessians for small networks fesiable. This is what I am referring to.