|
|
|
|
|
by storus
192 days ago
|
|
Linear attention is really bad, it's only good for benchmaxing but it leads to a loss of valuable granularity, which can be felt in the latest DeepSeek randomly forgetting/ignoring/correcting explicitly stated facts in the prompt. |
|