| HN Mirror

Further than that, it feels like we could use constrained generation of outputs [0] to force the model to do X amount of output inside of a <thinking> BEFORE writing an <answer> tag. It might not always produce good results, but I'm curious what sort of effect it might have to convince models that they really should stop and think first.

[0]: https://github.com/ggerganov/llama.cpp/blob/master/grammars/...