Hacker News new | ask | show | jobs
by wmwmwm 1482 days ago
Having just implemented a softmax() function for an online ML course, I think the python implementation here suffers from overflow if any of the elements of z get big(ish) - e.g. e^10000 is a big number! A spot of searching online suggests that subtracting max(z) from all entries in z makes it a lot more robust without changing the result e.g. https://www.tutorialexample.com/implement-softmax-function-w...
1 comments

Correct (horse battery staple). This is how it’s done in all production implementations.