|
|
|
|
|
by therealpygon
146 days ago
|
|
Context quite literally degrades performance of attention with size in non-needle-in-haystack lookups in almost every model to varying degrees. Thus to answer the question, the “waste” is making the model dumber unnecessarily in an attempt to make it smarter. |
|