|
|
|
|
|
by talldayo
620 days ago
|
|
> What transformers do is learn how to move data around in a context-relevant manner. This is a misrepresentation of how transformers behave and I think you should double-check the definition before dunking on other people's works. |
|
Also, the link I gave regarding induction heads is explicitly moving data in the context window forward.