Hacker News new | ask | show | jobs
by talldayo 620 days ago
> What transformers do is learn how to move data around in a context-relevant manner.

This is a misrepresentation of how transformers behave and I think you should double-check the definition before dunking on other people's works.

1 comments

It's not a misinterpretation. What attention does is discover association matrices which bind locations in the context window, and these associations are context sensitive. But binding locations through an association matrix is an implementation of the concept of routing, which is just moving data.

Also, the link I gave regarding induction heads is explicitly moving data in the context window forward.

I consider aggregate routing to be distinct from moving data. If the context is temporary then the "data" (weights and tokenizer) stays in place. LLMs are static, they do not move data so much as they infer from it.