Hacker News new | ask | show | jobs
by lcnielsen 250 days ago
Yeah, I did a lot of traditional optimization problems during my Ph. D., this type of expression pops up all the time with higher-order gradient-based methods. You rescale or otherwise adjust the gradient based on some system-characteristic eigenvalues to promote convergence without overshooting too much.
2 comments

This sounds a lot like what the Muon / Shampoo optimizer do.
Would you have some literature about that ?
There's a ton but it's pretty scattered. Yurii Nesterov's a big name, for example.