Hacker News new | ask | show | jobs
by whimsicalism 2034 days ago
I work in the field, I don't need the difference explained to me.

> Think of transformers as really wide (N) and really short (1) convolutions

Modern transformer networks are not "really short" and you're also conflating the difference between intra- and inter- attention.

There is still a pitched battle being waged between convnets and transformers for sequences, although it looks like transformers have the upper hand accuracy wise right now, convnets are competitive speed-wise.