Hacker News new | ask | show | jobs
Ghosts of Softmax: Zeros of the partition function explain training instability (github.com)
4 points by g4omingron 93 days ago
2 comments

Author here. The short version: softmax's partition function has complex zeros — from e^{iπ}+1=0 — that are invisible on the real line but cap safe step sizes at ρₐ = π/Δₐ. One JVP to compute. The repo has Colab notebooks if you want to poke at it. Happy to answer questions.

Full paper https://arxiv.org/html/2603.13552v1

Nice work! The paper feels verbose at times and could use some editing to slim it down (also, equation 6 is just equation 5 in a box) but I enjoyed it a lot nonetheless.
How does it differ from traditional "small step good, big step bad" literature?