|
|
|
|
|
by quantadev
618 days ago
|
|
Most LLMs aren't even using a "curve" yet at all, right? All they're using is a series of linear equations because the model weights are a simple multiply and add (i.e. basic NN Perceptron). Sure there's a squashing function on the output to keep it in a range from 0 to 1 but that's done BECAUSE we're just adding up stuff. I think probably future NNs will be maybe more adaptive than this perhaps where some Perceptrons use sine wave functions, or other kinds of math functions, beyond just linear "y=mx+b" It's astounding that we DID get the emergent intelligence from just doing this "curve fitting" onto "lines" rather than actual "curves". |
|
Ax = y
Adding another layer is just multiplying a different set of weights times the output of the first, so
B(Ax)= y
If you remember your linear algebra course, you might see the problem: that can be simplified
(BA)x = y
Cx = y
Completely indistinguishable from a single layer, thus only capable of modeling linear relationships.
To prevent this collapse, a non linear function must be introduced between each layer.