|
|
|
|
|
by OkayPhysicist
618 days ago
|
|
The "squashing function" necessarily is nonlinear in multilayer nueral networks. A single layer of a neural network can be quite simply written a weight matrix, times an input vector, equalling an output vector, like so Ax = y Adding another layer is just multiplying a different set of weights times the output of the first, so B(Ax)= y If you remember your linear algebra course, you might see the problem: that can be simplified (BA)x = y Cx = y Completely indistinguishable from a single layer, thus only capable of modeling linear relationships. To prevent this collapse, a non linear function must be introduced between each layer. |
|
But the entire NN itself (Perceptron ones, which most LLMs are) is still completely using nothing but linearity to store all the knowledge from the training process. All the weights are just an 'm' in the basic line equation 'y=m*x+b'. The entire training process does nothing but adjust a bunch of slopes of a bunch of lines. It's totally linear. No non-linearity at all.