Hacker News new | ask | show | jobs
Softmax: Why neural networks need non-linearity? life isn't straight-line simple (blog.sparsh.dev)
9 points by sparshrestha 6 days ago
2 comments

This is a weird post, it talks about non-linear functions, but then goes into softmax as a non-linear function. Softmax is rarely used as a direct non-linearity inside a neural network [1], but in the last layer as softmax regression, which is a linear decision boundary. You can easily show this in the two-class case (logistic function). The decision boundary is (a)=0.5, a=wx+b, so we have

    1/(1+e^-a) = 0.5
It can be shown trivially that -a must be 0 (since e^0=1), so we get the decision boundary is wx+b, which is linear.

From the title I'd expect the article to show that softmax classifiers use linear decision boundaries and would use it as a motivation to introduce a non-linearity in a hidden layer.

[1] You could of course argue that softmax as used in attention is a non-linearity in the attention layer, but it is used differently than a direct application of a non-linearity like ReLU, GELU, etc. to an affine transformation.

Math functions that calculate weighted sum of inputs and adds bias to give non-linearity to output of neuron.
It doesn't though. Wx + b is an affine transformation, which is just a linear transformation + translation: https://en.wikipedia.org/wiki/Affine_transformation
Yes, thanks for the correction. The opening line was not meant as an excerpt. It was describing the neuron’s weighted sum plus bias, not softmax.

Activation functions give non-linearity. I have fixed it. I believe the rest was correct.