| Transformer produce the next token by manipulating K hidden vectors per layer, one vector per preceding token. So yes you can increase compute length arbitrarily by increasing tokens. Those tokens don't have to carry any information to work. https://arxiv.org/abs/2310.02226 And again, human brains are clearly limited in the number of steps it can compute without writing something down.
Limited =/ Trivial >FYI, "attention is all you need" has the implicit context of "if all you want to build is a language model". Great. Do you know what a "language model" is capable of in the limit ? No These top research labs aren't only working on Transformers as they currently exist but it doesn't make much sense to abandon a golden goose before it has hit a wall. |
No - there is a loop between the cortex and thalamus, feeding the outputs of the cortex back in as inputs. Our brain can iterate for as long as it likes before initiating any motor output, if any, such as writing something down.