| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by in-silico 17 days ago

This is really semantics, but I wouldn't call attending to the KV cache re-reading the context.

The model takes in the context, encodes it into a "memory" (the KV cache), and accesses that memory later. That fact doesn't change just because the KV cache grows in size with the context.

I don't know what memory would look like other than an encode-retrieve loop.

Relevant: Transformers are Multi-State RNNs - https://arxiv.org/abs/2401.06104