Hacker News new | ask | show | jobs
by Szpadel 472 days ago
I do not understand why to force wait when model want to output </think>.

why not just decrease </think> probability? if model really wants to finish maybe or could over power it in cases were it's really simple question. and definitely would allow model to express next thought more freely

1 comments

  why not just decrease </think> probability?
Huggingface's transformers library supports something similar to this. You set a minimum length, and until that length is reached, the end of sequence token has no chance of being output.

https://github.com/huggingface/transformers/blob/51ed61e2f05...

S1 does something similar to put a lower limit on its reasoning output. End of thinking is represented with the <|im_start|> token, followed by the word 'answer'. IIRC the code dynamically adds/removes <|im_start|> to the list of suppressed tokens.

Both of these approaches set the probability to zero, not something small like you were suggesting.