| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lcnielsen 298 days ago
	Yeah, I did a lot of traditional optimization problems during my Ph. D., this type of expression pops up all the time with higher-order gradient-based methods. You rescale or otherwise adjust the gradient based on some system-characteristic eigenvalues to promote convergence without overshooting too much.

2 comments

This sounds a lot like what the Muon / Shampoo optimizer do.

Would you have some literature about that ?

There's a ton but it's pretty scattered. Yurii Nesterov's a big name, for example.