Hacker News new | ask | show | jobs
by bjourne 55 days ago
So softmax is e^x projection followed by l1 norm. Why is e^x projection useful?
1 comments

It maps (-inf, inf) to (0, inf) in about as nice a way as you could expect (addition turns into multiplication). When you want to constrain a value to be positive, parameterizing it with exp is usually a good option.
And importantly it's got nice properties like being differentiable and monotonic, unlike eg. taking |x|.