|
|
|
|
|
by visarga
547 days ago
|
|
It really does see its past tokens. Attention makes that a core feature, it can introspect on its own past thoughts. And like humans have to speak one word at a time (serially), so does the model. Distributed activity of neurons funneled through the centralizing bottleneck of serial token prediction. This parallel-to-serial process is the key, I think it's where consciousness is. |
|