| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by deliciousturkey 5 days ago
	I dislike the non-specificity of "models" here. Different models have different attention architectures, and can therefore have significant differences in long-context behavior. It's true that long context is an issue can most models do drop off in quality, but I would not extrapolate behavior of old models to new ones.

1 comments

could you explains on how? what changed with attention mechanisms to allow for such a shift?