|
|
|
|
|
by HarHarVeryFunny
656 days ago
|
|
You are confusing number of sequential steps with total amount of compute spent. The input sequence is processed in parallel, regardless of length, so number of tokens has no impact on number of sequential compute steps which is always N=layers. > Do you know what a "language model" is capable of in the limit ? Well, yeah, if the language model is an N-layer transformer ... |
|
Then increase N (N is almost always increased when a model is scaled up) and train or write things down and continue.
A limitless iteration machine (without external aid) is currently an idea of fiction. Brains can't do it so I'm not particularly worried if machines can't either.