| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by btown 490 days ago
	Specifically, I've seen that a common failure mode of the distilled Deepseek models is that they don't know when they're going in circles. Deepseek incentivizes the distilled LLM to interrupt itself with "Wait." which incentivizes a certain degree of reasoning, but it's far less powerful than the reasoning of the full model, and can get into cycles of saying "Wait." ad infinitum, effectively second-guessing itself on conclusions it's already made rather than finding new nuance.

1 comments

pockmarked19 489 days ago

The full model also gets into these infinite cycles. I just tried asking the old river crossing boat problem but with two goats and a cabbage and it goes on and on forever.

link