Hacker News new | ask | show | jobs
by Rumengol 495 days ago
The issue is that they claim that you don't need an extensive amount of data to do efficient reasoning. But that alone is a bit misleading, if you need a massive model to fine tune and another one to piece together the small amount of data.

I've seen the textbook analogy used, but to me it's like a very knowledgeable person reading an advanced textbook to become an expert. Then they say they're better than the other very knowledgeable persons because he read that manual, and everyone can start from scratch using it.

So there's nothing wrong with making a more efficient model from an existing one, the issue is concluding you don't need all the data that made the existing one possible in the first place. While that may be true, this is not how you prove it.

2 comments

> The issue is that they claim that you don't need an extensive amount of data to do efficient reasoning.

they claim that efficient reasoning can be achieve by applying a small set of SFT samples. how that sample set is collected/filtered is irrelevant here. they just reported the fact that this is possible. this by itself is a new and interesting finding.

I completely agree with the point made here. Apart from the research controversial in the paper, however, from an engineering practice perspective, the methodology presented in the paper offers the industry an effective approach to distill structural cognitive capabilities from advanced models and integrate them into less competent ones.

Moreover, I find the Less-Is-More Reasoning (LIMO) hypothesis particularly meaningful. It suggests that encoding the cognitive process doesn't require extensive data; instead, a small amount of data can elicit the model's capabilities. This hypothesis and observation, in my opinion, are highly significant and offer valuable insights, much more than the specific experiment itself.