| >No amount of training would cause a fly brain to be able to do what an octopus or bird brain can, or to model their behavioral generating process. Go back a few evolutionary steps and sure you can. Most ANN architectures basically have relatively little to no biases baked in and the Transformer might be the most blank slate we've built yet. >No amount of training will cause a transformer to magically sprout feedback paths or internal memory, or an ability to alter it's own weights, etc. A transformer can perform any computation it likes in a forward pass and you can arbitrarily increase inference compute time with the token length. Feedback paths? Sure. Compute inefficient? Perhaps. Some extra programming around the Model to facilitate this ? Maybe but the architecture certainly isn't stopping you. Even if it couldn't, limited =/ trivial. The Human Brain is not Turing complete. Internal Memory ?
Did you miss the memo ? Recurrency is overrated. Attention is all you need. That said, there are already state keeping language model architectures around. Altering weights ?
Can a transformer continuously train ? Sure. It's not really compute efficient but architecture certainly doesn't prohibit it. >Architecture matters Compute Efficiency? Sure. What it is capable of learning? Not so much |
No it can't.
A transformer has a fixed number of layers - call it N. It performs N sequential steps of computation to derive it's output.
If a computation requires > N steps, then a transformer most certainly can not perform it in a forward pass.
FYI, "attention is all you need" has the implicit context of "if all you want to build is a language model". Attention is not all you need if what you actually want to build is a cognitive architecture.