|
|
|
|
|
by HanClinto
653 days ago
|
|
Further than that, it feels like we could use constrained generation of outputs [0] to force the model to do X amount of output inside of a <thinking> BEFORE writing an <answer> tag. It might not always produce good results, but I'm curious what sort of effect it might have to convince models that they really should stop and think first. [0]: https://github.com/ggerganov/llama.cpp/blob/master/grammars/... |
|