|
|
|
|
|
by sp332
1062 days ago
|
|
With transformer models, it’s common to put instructions and system messages at the beginning of the input. But with this decay, the beginning of the input would always have the sparsest attention, right? Maybe the instructions should be moved to the end. But then again if it’s recurrent, you might want to prime it with a description of the task. |
|