Hacker News new | ask | show | jobs
by hackinthebochs 620 days ago
It's not a misinterpretation. What attention does is discover association matrices which bind locations in the context window, and these associations are context sensitive. But binding locations through an association matrix is an implementation of the concept of routing, which is just moving data.

Also, the link I gave regarding induction heads is explicitly moving data in the context window forward.

1 comments

I consider aggregate routing to be distinct from moving data. If the context is temporary then the "data" (weights and tokenizer) stays in place. LLMs are static, they do not move data so much as they infer from it.