|
|
|
|
|
by kmmlng
895 days ago
|
|
> This is all true in a neutral net, but Transformers aren't Neural Nets in the traditional sense. I was under that impression originally, but there's not a back propagation or Hebbian learning here, which were the key bits of biomimicry that earned classic NNs their name. Hebbian learning has never been used with much success in training neural nets. Backpropagation is not bio-inspired, but backpropagation is certainly used to train transformers. |
|
For Backprop, I'm basing this off the development of the Perception. Wiki supports this and its bio-inslired origin[1].
As for its use in Transformers, if you mean simple regressing of errors or use of gradient descent, I'd agree, but that's not usually called Backprop and the term isn't used in the original paper. The term typically means back propagating the errors thru the entire network at a certain stage of learning, and that's not present in Transformers that I can tell.
Happy to see any support for your claims tho.
https://en.m.wikipedia.org/wiki/Backpropagation