|
|
|
|
|
by Szpadel
472 days ago
|
|
I do not understand why to force wait when model want to output </think>. why not just decrease </think> probability? if model really wants to finish maybe or could over power it in cases were it's really simple question. and definitely would allow model to express next thought more freely |
|
https://github.com/huggingface/transformers/blob/51ed61e2f05...
S1 does something similar to put a lower limit on its reasoning output. End of thinking is represented with the <|im_start|> token, followed by the word 'answer'. IIRC the code dynamically adds/removes <|im_start|> to the list of suppressed tokens.
Both of these approaches set the probability to zero, not something small like you were suggesting.