|
|
|
|
|
by bmc7505
2776 days ago
|
|
Would be interesting to explore whether evaluating ∇²f(x) for higher-order SGD methods directly is now feasible for smaller DNNs and whether this leads to faster convergence during training. Most methods like Newton or Gauss-newton were thought intractable for DNNs. Also curious if the angle descent prescribed by quasi-Newton methods is empirically closer to the true 2nd order gradient and whether these approximations are useful in wall-clock time. |
|