| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by petercooper 503 days ago
	It's not RL, but you can get a long way with a thorough system prompt to encourage it to engage in 'thinking' behavior on its own without extra training. Just playing with it myself now with promising results - Mistral Small seems very receptive to this approach (not all models are - cough, Llama). Update: This is such a prompt: https://gist.github.com/peterc/955d797ee35b3c777d76a2d881d2f...