|
|
|
|
|
by quantadev
618 days ago
|
|
Right. All the squashing is doing is keeping the output of any neuron in a range of below 1. But the entire NN itself (Perceptron ones, which most LLMs are) is still completely using nothing but linearity to store all the knowledge from the training process. All the weights are just an 'm' in the basic line equation 'y=m*x+b'. The entire training process does nothing but adjust a bunch of slopes of a bunch of lines. It's totally linear. No non-linearity at all. |
|
> nothing but linearity
No, if you have non linearities, the NN itself is not linear. The non linearities are not there primarily to keep the outputs in a given range, though that's important, too.