Hacker News new | ask | show | jobs
by zarzavat 757 days ago
It’s not about arithmetic but about embeddings. The positional embeddings used in transformers are rather simplistic. If they can add this one new capability to transformers by using different embeddings then maybe there are other capabilities that are within reach.
1 comments

No, because those embeddings only work for addition (very weakly for multiplication and sorting). Imagine needing a specially-crafted bias for every single task. The Deep Learning revolution brought on by Convolutional Neural Nets was supposed to do away with the need to do exactly that.