|
|
|
|
|
by deliciousturkey
5 days ago
|
|
I dislike the non-specificity of "models" here. Different models have different attention architectures, and can therefore have significant differences in long-context behavior. It's true that long context is an issue can most models do drop off in quality, but I would not extrapolate behavior of old models to new ones. |
|