Hacker News new | ask | show | jobs
by CodesInChaos 2826 days ago
Softmax has two components:

1. Transform the components to e^x. This allows the neural network to work with logarithmic probabilities, instead of ordinary probabilities. This turns the common operation of multiplying probabilities into addition, which is far more natural for the linear algebra based structure of neural networks.

2. Normalize their sum to 1, since that's the total probability we need.

One important consequence of this is that bayes' theorem is very natural to such a network, since it's just multiplication of probabilities normalized by the denominator.

The trivial case of a single layer network with softmax activation is equivalent to logistic regression.

The special case of two component softmax is equivalent to sigmoid activation, which is thus popular when there are only two classes. In multi class classification softmax is used if the classes are mutually exclusive and component-wise sigmoid is used if they are independent.

1 comments

Thanks for the detailed explanation!