|
|
|
|
|
by Terretta
490 days ago
|
|
> To generate the perfect 817 math examples for LIMO, they used state of the art models like R1 to filter down from an initial pool of 10 million math problems. In other words, a whole lot of intelligence was used to craft a maximally informative and distilled set of fine-tuning data The paper, and this comment, seem awfully reminiscent of creating a textbook of curated "maximally informative and distilled" set of cognitive examples to teach students with foundational learning a next level of reasoning. The last few years of LLM progress have shown we can predict human "reasoning" responses to inputs by modeling likely human responses as if LLM generated. Put another way, most responses are not particularly reasoned, but chain of tokgen*. Sit near someone who "talks to herself" while doing problems and it's even more evident. --- * tokgen definition: Listen to conversations in a cafeteria. Many are something other than thoughtful, responses that follow the prompts, with near perfect predictability. To differentiate from these responses and speech that comes after a pause and reflect, one can use the labels thought versus token generation or tokgen. |
|
The 800+ training samples, each containing solutions with detailed reasoning steps, were primarily generated by DeepSeek r1 and advanced models. The reasoning processes within these training solutions are crucial. It's possible that the advanced models have encoded these reasoning processes through the generated samples. Given a sufficiently large model, it can effectively restore such reasoning weights, effectively adding a delta from DeepSeek r1, among others.
Therefore, it's not surprising that, with relatively few fine-tuning data, Qwen 2.5 has achieved such significant improvements.
This is merely a conjecture. Further research is needed to analyze and visualize the changes in network weights before and after fine-tuning.