|
|
|
|
|
by Y_Y
323 days ago
|
|
> LLMs are explicitly trained to pay attention to the entire text I'd respectfully disagree on this point. The magic of attention in transformers is the selective attention applied, which ideally only gives significant weight to the tokens relevant to the query. |
|