|
|
|
|
|
by quantadev
621 days ago
|
|
> The non linearities are not there primarily to keep the outputs in a given range Precisely what the `Activation Function` does is to squash an output into a range (normally below one, like tanh). That's the only non-linearity I'm aware of. What other non-linearities are there? All the training does is adjust linear weights tho, like I said. All the training is doing is adjusting the slopes of lines. |
|
"only" is doing a lot work here because that non-linearity is enough to vastly expand the landscape of functions that an NN can approximate. If the NN was linear, you could greatly simplify the computational needs of the whole thing (as was implied by another commenter above) but you'd also not get a GPT out of it.