Softmax: Why neural networks need non-linearity? life isn't straight-line simple

Y	Hacker News new \| ask \| show \| jobs

	Softmax: Why neural networks need non-linearity? life isn't straight-line simple (blog.sparsh.dev)
	9 points by sparshrestha 6 days ago

2 comments

microtonal 4 days ago

This is a weird post, it talks about non-linear functions, but then goes into softmax as a non-linear function. Softmax is rarely used as a direct non-linearity inside a neural network [1], but in the last layer as softmax regression, which is a linear decision boundary. You can easily show this in the two-class case (logistic function). The decision boundary is (a)=0.5, a=wx+b, so we have

    1/(1+e^-a) = 0.5

It can be shown trivially that -a must be 0 (since e^0=1), so we get the decision boundary is wx+b, which is linear.

From the title I'd expect the article to show that softmax classifiers use linear decision boundaries and would use it as a motivation to introduce a non-linearity in a hidden layer.

[1] You could of course argue that softmax as used in attention is a non-linearity in the attention layer, but it is used differently than a direct application of a non-linearity like ReLU, GELU, etc. to an affine transformation.

link

sparshrestha 6 days ago

Math functions that calculate weighted sum of inputs and adds bias to give non-linearity to output of neuron.

link

microtonal 4 days ago

It doesn't though. Wx + b is an affine transformation, which is just a linear transformation + translation: https://en.wikipedia.org/wiki/Affine_transformation

link

sparshrestha 2 days ago

Yes, thanks for the correction. The opening line was not meant as an excerpt. It was describing the neuron’s weighted sum plus bias, not softmax.

Activation functions give non-linearity. I have fixed it. I believe the rest was correct.

link