|
This is all true in a neutral net, but Transformers aren't Neural Nets in the traditional sense. I was under that impression originally, but there's not a back propagation or Hebbian learning here, which were the key bits of biomimicry that earned classic NNs their name. Transformers do have coefficients that are fit, but that's more broad.. could be used for any sort of regression or optimization, and not necessarily indicative of biological analogs. So I think the terms "learned model" of "weights" are malapropisms for Transformers, carried over from deep nets because of structural similarities, like many layers, and the development workflow. The functional units in Transformer's layers have lost their orginal biological inspiration and functional analog. The core function in Transformers is more like autoencoding/decoding (concepts from info theory) and model/grammar-free translation, with a unique attention based optimization. Transformers were developed for translation. The magic is smth like "attending" to important parts of the translation inputs&outputs as tokens are generated, maybe as a kind of deviation on pure autoencoding, due to the bias from the .. learned model :) See I can't even escape it. Attention as a powerful systemic optimization is the actual closer bit of neuro/bio-insporation here.. but more from Cog Psych than micro/neuro anatomy. Btw, not only is attention a key insight for Transformers, but it's an interesting biographical note that the lead inventor of it, Jakob Uzkereit, went on to work on a bio-AI startup after Google. |
Hebbian learning has never been used with much success in training neural nets. Backpropagation is not bio-inspired, but backpropagation is certainly used to train transformers.