|
|
|
|
|
by ttul
1062 days ago
|
|
Hypothetically, fine tuned transformer networks would learn to attend more to the beginning and end of the sequence since that is where the instructions typically are. And lo and behold a recent paper demonstrated this to be true. |
|