| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Der_Einzige 3 hours ago

A lot of the perception of open source models being garbage is that they're still using the same piss-poor sampling algorithms that OpenAI/Anthropic force on their users, i.e. Top-p, top-k.

These lead to small accumulation of sampling errors which makes it all but inevitable that open source models will shit the bed by the 200K token mark or even sooner.

If you set your opencode to use a good sampling algorithm, such as min_p or top-n sigma (llamacpp supports both), you'll find that at least for long running tasks, your model gets a lot better.

It won't make GLM as good as Opus 4.8, but it will stop the feeling of "brain damage" from running open source models at the edge of their context windows.

And yes, there is an upcoming (hopefully NeurIPS) paper titled "Long Context Generation is a Sampling Problem" for more details about this. Give it two months and it'll be on Arxiv one way or another.