Hacker News new | ask | show | jobs
by microtonal 2 days ago
This is a weird post, it talks about non-linear functions, but then goes into softmax as a non-linear function. Softmax is rarely used as a direct non-linearity inside a neural network [1], but in the last layer as softmax regression, which is a linear decision boundary. You can easily show this in the two-class case (logistic function). The decision boundary is (a)=0.5, a=wx+b, so we have

    1/(1+e^-a) = 0.5
It can be shown trivially that -a must be 0 (since e^0=1), so we get the decision boundary is wx+b, which is linear.

From the title I'd expect the article to show that softmax classifiers use linear decision boundaries and would use it as a motivation to introduce a non-linearity in a hidden layer.

[1] You could of course argue that softmax as used in attention is a non-linearity in the attention layer, but it is used differently than a direct application of a non-linearity like ReLU, GELU, etc. to an affine transformation.