Hacker News new | ask | show | jobs
by cs702 2 hours ago
A superior alternative to standard Muon and AdamW optimizers for training large models.

Fantastic work, instantly valuable, immediately usable.

A big THANK YOU to the authors:

Jack Zhang, Noah Amsel, Berlin Chen, and Tri Dao