Hacker News new | ask | show | jobs
by tw1984 490 days ago
With really high quality samples, the reasoning ability of a well trained LLM can be activated using very small amount of SFT samples, this is what I learned from the paper. It is an interesting finding but not practical through, as you need a far more capable reasoning model (R1 in this case) to get those high quality 817 samples first. DeepSeek-R1-Distill-Qwen-32B has better reasoning skills according to the same benchmarks.

Another trend I've noticed is that there are already 3 papers reporting similar findings by using Qwen-2.5-Instruct. Did they find something interesting on LLMs or something unique to Qwen-2.5-Instruct. I guess we need more experiment results to draw conclusions.