Hacker News new | ask | show | jobs
by Nopoint2 396 days ago
I don't get the point. A simple CNN with stride =1 should be able to solve it perfectly and generalize it to any size.
1 comments

It wasn't obvious that a transformer could do this, and learn to produce conv via attention
But it is, as long as the positional embedding are sufficient, i.e. use relative positional embeddings here.