| HN Mirror

  why not just decrease </think> probability?

Huggingface's transformers library supports something similar to this. You set a minimum length, and until that length is reached, the end of sequence token has no chance of being output.

https://github.com/huggingface/transformers/blob/51ed61e2f05...

S1 does something similar to put a lower limit on its reasoning output. End of thinking is represented with the <|im_start|> token, followed by the word 'answer'. IIRC the code dynamically adds/removes <|im_start|> to the list of suppressed tokens.

Both of these approaches set the probability to zero, not something small like you were suggesting.