| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by fwipsy 1 hour ago
	I think this is predicted? Part of the story is how they were able to preserve core reasoning ability while cutting knowledge like "pelicans have wings." > these findings motivate the Parametric Compression-Coverage Hypothesis, which views verifiable reasoning as compressible into compact reasoning cores, while open-domain knowledge and general-purpose competence require broad parameter coverage over facts, concepts, and long-tail scenarios.

1 comments

pylotlight 1 hour ago

The only real essential item here is tool calling capability is it not? So I assume they tested a strong read/write/edit tool consistency?

link

nsingh2 1 hour ago

This model doesn't support tool calling, was not part of its training. It's focused on Python (and I think C++) competitive programming and mathematics tasks, i.e. tasks with verifiable rewards. So if you have a task that fits that description, the size-to-capability ratio is good.

These kinds of models might be more useful as tools to be used by larger orchestrator models, than being the orchestrators themselves.

link

btown 1 hour ago

I'm not seeing any mention of tools in the paper, much less a bias towards "curiosity" to use those tools when it encounters gaps in its knowledge. So perhaps this is a good proof-of-concept that single-pass code generation is viable with this small a model - but we're still a long way from a viable solution.

link