Y
Hacker News
new
|
ask
|
show
|
jobs
by
Nopoint2
396 days ago
I don't get the point. A simple CNN with stride =1 should be able to solve it perfectly and generalize it to any size.
1 comments
montebicyclelo
396 days ago
It wasn't obvious that a transformer could do this, and learn to produce conv via attention
link
artemisart
396 days ago
But it is, as long as the positional embedding are sufficient, i.e. use relative positional embeddings here.
link