|
|
|
|
|
by petercooper
503 days ago
|
|
It's not RL, but you can get a long way with a thorough system prompt to encourage it to engage in 'thinking' behavior on its own without extra training. Just playing with it myself now with promising results - Mistral Small seems very receptive to this approach (not all models are - cough, Llama). Update: This is such a prompt: https://gist.github.com/peterc/955d797ee35b3c777d76a2d881d2f... |
|