|
|
|
|
|
by whimsicalism
2034 days ago
|
|
I work in the field, I don't need the difference explained to me. > Think of transformers as really wide (N) and really short (1) convolutions Modern transformer networks are not "really short" and you're also conflating the difference between intra- and inter- attention. There is still a pitched battle being waged between convnets and transformers for sequences, although it looks like transformers have the upper hand accuracy wise right now, convnets are competitive speed-wise. |
|