| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by HanClinto 653 days ago
	Further than that, it feels like we could use constrained generation of outputs [0] to force the model to do X amount of output inside of a <thinking> BEFORE writing an <answer> tag. It might not always produce good results, but I'm curious what sort of effect it might have to convince models that they really should stop and think first. [0]: https://github.com/ggerganov/llama.cpp/blob/master/grammars/...