Hacker News new | ask | show | jobs
by ttul 1062 days ago
Hypothetically, fine tuned transformer networks would learn to attend more to the beginning and end of the sequence since that is where the instructions typically are. And lo and behold a recent paper demonstrated this to be true.