Hacker News new | ask | show | jobs
by visarga 156 days ago
It's not new, been used like that since the 80's. It scales the logits in a sum of exponentials.