| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kcorbitt 571 days ago
	Lots of folks working on open-source reasoning models trained with reinforcement learning right now. The best one atm appears to be Alibaba's 32B-parameter QwQ: https://qwenlm.github.io/blog/qwq-32b-preview/ I also recently wrote a blog explaining how reinforcement fine-tuning works, which is likely at least part of the pipeline used to train o1: https://openpipe.ai/blog/openai-rft

1 comments

HappMacDonald 571 days ago

I don't know if I would call it "the best one" when it has "How many r in strawberry" as one of its example questions and when tried it arrives at the answer "two".

link