Hacker News new | ask | show | jobs
by btown 490 days ago
Specifically, I've seen that a common failure mode of the distilled Deepseek models is that they don't know when they're going in circles. Deepseek incentivizes the distilled LLM to interrupt itself with "Wait." which incentivizes a certain degree of reasoning, but it's far less powerful than the reasoning of the full model, and can get into cycles of saying "Wait." ad infinitum, effectively second-guessing itself on conclusions it's already made rather than finding new nuance.
1 comments

The full model also gets into these infinite cycles. I just tried asking the old river crossing boat problem but with two goats and a cabbage and it goes on and on forever.