Hacker News new | ask | show | jobs
by g4omingron 93 days ago
Author here. The short version: softmax's partition function has complex zeros — from e^{iπ}+1=0 — that are invisible on the real line but cap safe step sizes at ρₐ = π/Δₐ. One JVP to compute. The repo has Colab notebooks if you want to poke at it. Happy to answer questions.

Full paper https://arxiv.org/html/2603.13552v1

1 comments

Nice work! The paper feels verbose at times and could use some editing to slim it down (also, equation 6 is just equation 5 in a box) but I enjoyed it a lot nonetheless.