|
|
|
|
|
by richardjennings
79 days ago
|
|
> The shorter "thinking" is, the less is the probability of it going astray As long as the error introduced by more steps is less than the compounding error of sub-optimal token sampling, I would expect a better result. I think your choice of "wrong" is extreme, suggesting such a token can catastrophically spoil the result. The modern reality is more that the model is able to recover. |
|