| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Nopoint2 396 days ago
	I don't get the point. A simple CNN with stride =1 should be able to solve it perfectly and generalize it to any size.

1 comments

It wasn't obvious that a transformer could do this, and learn to produce conv via attention

But it is, as long as the positional embedding are sufficient, i.e. use relative positional embeddings here.