| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hackinthebochs 620 days ago

Computation generally is partly moving data around, yes. What transformers do is learn how to move data around in a context-relevant manner. This greatly increases the expressivity of the kinds of computations they can perform over traditional deep nets.

https://lilianweng.github.io/posts/2018-06-24-attention/

https://transformer-circuits.pub/2022/in-context-learning-an...

https://transformer-circuits.pub/2021/framework/index.html#r...

1 comments

talldayo 620 days ago

> What transformers do is learn how to move data around in a context-relevant manner.

This is a misrepresentation of how transformers behave and I think you should double-check the definition before dunking on other people's works.

link

hackinthebochs 620 days ago

It's not a misinterpretation. What attention does is discover association matrices which bind locations in the context window, and these associations are context sensitive. But binding locations through an association matrix is an implementation of the concept of routing, which is just moving data.

Also, the link I gave regarding induction heads is explicitly moving data in the context window forward.

link

talldayo 620 days ago

I consider aggregate routing to be distinct from moving data. If the context is temporary then the "data" (weights and tokenizer) stays in place. LLMs are static, they do not move data so much as they infer from it.

link