|
|
|
|
|
by dev_throwaway
448 days ago
|
|
This is not a bad way of looking at it, if I may add a bit, the llm is a solid state system. The only thing that survives from one iteration to the next is the singular highest ranking token, the entire state and "thought process" of the network cannot be represented by a single token, which means that every strategy is encoded in it during training, as a lossy representation of the training data. By definition that is a database, not a thinking system, as the strategy is stored, not actively generated during usage. The anthropomorphization of llms bother me, we don't need to pretend they are alive and thinking, at best that is marketing, at worst, by training the models to output human sounding conversations we are actively taking away the true potential these models could achieve by being ok with them being "simply a tool". But pretending that they are intelligent is what brings in the investors, so that is what we are doing. This paper is just furthering that agenda. |
|
This is not true. The key-values of previous tokens encode computation that can be accessed by attention, as mentioned by colah3 here: https://news.ycombinator.com/item?id=43499819
You may find https://transformer-circuits.pub/2021/framework/index.html useful.