Hacker News new | ask | show | jobs
by bugglebeetle 499 days ago
Interested to see what folks do with putting DeepSeek-style RL methods on top of this. The smaller Mistral models have always punched above their weight and been the best for fine-tuning.
1 comments

It's not RL, but you can get a long way with a thorough system prompt to encourage it to engage in 'thinking' behavior on its own without extra training. Just playing with it myself now with promising results - Mistral Small seems very receptive to this approach (not all models are - cough, Llama).

Update: This is such a prompt: https://gist.github.com/peterc/955d797ee35b3c777d76a2d881d2f...