|
|
|
|
|
by wizzwizz4
238 days ago
|
|
They rebuild the "long term plan" anew for every token: there's no guarantee that the reconstructed plan will remain similar between tokens. That's not how planning normally works. (You can find something like this every time there's this kind of gross inefficiency, which is why I gave the general principle.) |
|
Well no, there is attention in the LLM which allows it to look back at it's "internal thought" during the previous tokens.
Token T at layer L, can attend to a projection of the hidden states of all tokens < T at L. So its definitely not starting anew at every token and is able to iterate on an existing plan.
Its not a perfect mechanism for sure, and there is work to make LLMs able to carry more information forward (e.g. feedback transformers), but they can definitely do some of that today.