Hacker News new | ask | show | jobs
by daemonologist 475 days ago
It says "wait" (as in "wait, no, I should do X") so much while reasoning it's almost comical. I also ran into the "catastrophic forgetting" issue that others have reported - it sometimes loses the plot after producing a lot of reasoning tokens.

Overall though quite impressive if you're not in a hurry.

2 comments

I read somewhere which I can't find now, that for the -reasoning- models they trained heavily to keep saying "wait" so they can keep reasoning and not return early.
Is the model using budget forcing?
I do not understand why to force wait when model want to output </think>.

why not just decrease </think> probability? if model really wants to finish maybe or could over power it in cases were it's really simple question. and definitely would allow model to express next thought more freely

  why not just decrease </think> probability?
Huggingface's transformers library supports something similar to this. You set a minimum length, and until that length is reached, the end of sequence token has no chance of being output.

https://github.com/huggingface/transformers/blob/51ed61e2f05...

S1 does something similar to put a lower limit on its reasoning output. End of thinking is represented with the <|im_start|> token, followed by the word 'answer'. IIRC the code dynamically adds/removes <|im_start|> to the list of suppressed tokens.

Both of these approaches set the probability to zero, not something small like you were suggesting.

I have a suspicion it does use budget forcing. The word "alternatively" also frequently show up and it happens when it seems logically that a </think> tag could have been place.