| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by in-silico 150 days ago
	A lot of people point to the Muon optimizer that Moonshot (the creators of Kimi) pioneered. Compared to the standard optimizer AdamW, Muon amplifies low-magnitude gradient directions which makes the model learn faster (and maybe gives Kimi its unique qualities). Muon paper: https://arxiv.org/abs/2502.16982

1 comments

Wow! Thank you