| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ttul 1062 days ago
	Hypothetically, fine tuned transformer networks would learn to attend more to the beginning and end of the sequence since that is where the instructions typically are. And lo and behold a recent paper demonstrated this to be true.