| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by brunoalano 105 days ago

I've been experimenting with Muon on GNNs to see whether orthogonalizing updates helps with the usual depth problems.

In my runs, the shallow 2-layer setting looked mostly similar to AdamW. The more interesting case was moderate depth: around 8 layers, Muon was noticeably more stable and gave better final results. I also saw a fairly large robustness gap under feature noise and edge dropout.

The writeup focuses on the spectral side of the story: singular values, conditioning, and why the effect seems to show up more in deeper message-passing stacks than in the standard shallow benchmark regime.

I included the negative results too: Muon is slower per epoch, it doesn’t win everywhere, and by very large depth the optimizer alone is not enough.