Hacker News new | ask | show | jobs
by Terretta 490 days ago
> To generate the perfect 817 math examples for LIMO, they used state of the art models like R1 to filter down from an initial pool of 10 million math problems. In other words, a whole lot of intelligence was used to craft a maximally informative and distilled set of fine-tuning data

The paper, and this comment, seem awfully reminiscent of creating a textbook of curated "maximally informative and distilled" set of cognitive examples to teach students with foundational learning a next level of reasoning.

The last few years of LLM progress have shown we can predict human "reasoning" responses to inputs by modeling likely human responses as if LLM generated. Put another way, most responses are not particularly reasoned, but chain of tokgen*.

Sit near someone who "talks to herself" while doing problems and it's even more evident.

---

* tokgen definition: Listen to conversations in a cafeteria. Many are something other than thoughtful, responses that follow the prompts, with near perfect predictability. To differentiate from these responses and speech that comes after a pause and reflect, one can use the labels thought versus token generation or tokgen.

2 comments

After reviewing the paper and GitHub training dataset, I have the following observations:

The 800+ training samples, each containing solutions with detailed reasoning steps, were primarily generated by DeepSeek r1 and advanced models. The reasoning processes within these training solutions are crucial. It's possible that the advanced models have encoded these reasoning processes through the generated samples. Given a sufficiently large model, it can effectively restore such reasoning weights, effectively adding a delta from DeepSeek r1, among others.

Therefore, it's not surprising that, with relatively few fine-tuning data, Qwen 2.5 has achieved such significant improvements.

This is merely a conjecture. Further research is needed to analyze and visualize the changes in network weights before and after fine-tuning.

>The last few years of LLM progress have shown we can predict human "reasoning" responses to inputs by modeling likely human responses as if LLM generated. Put another way, most responses are not particularly reasoned, but chain of tokgen*.

Sorry, but I don't get the point of your comment as a whole, and of this part in particular. Yes, most human day-to-day conversations are quite predictable, but some people are still capable of generating original thoughts from time to time. And still, how is it related to the comment you are replying to?

> how is it related to the comment you are replying to

Sorry, with quoting, and stating differently:

a whole lot of intelligence was used to craft a maximally informative and distilled set of fine-tuning data

A whole lot of intelligence is used to craft maximally informative and distilled set learning into textbooks, to fine-tune reasoning outcomes from our LLM-ish brains.

Or, put the other way around, what works for us can often inform what works for LLMs.