Hacker News new | ask | show | jobs
by montebicyclelo 396 days ago
It wasn't obvious that a transformer could do this, and learn to produce conv via attention
1 comments

But it is, as long as the positional embedding are sufficient, i.e. use relative positional embeddings here.