|
|
|
|
|
by in-silico
17 days ago
|
|
This is really semantics, but I wouldn't call attending to the KV cache re-reading the context. The model takes in the context, encodes it into a "memory" (the KV cache), and accesses that memory later. That fact doesn't change just because the KV cache grows in size with the context. I don't know what memory would look like other than an encode-retrieve loop. Relevant: Transformers are Multi-State RNNs - https://arxiv.org/abs/2401.06104 |
|