| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by montebicyclelo 396 days ago
	It wasn't obvious that a transformer could do this, and learn to produce conv via attention

1 comments

But it is, as long as the positional embedding are sufficient, i.e. use relative positional embeddings here.