Hacker News new | ask | show | jobs
by Mond_ 503 days ago
> Deepseek could only build this because of o1, I don’t think there’s as much competition as people seem to imply

And this is based on what exactly? OpenAI hides the reasoning steps, so training a model on o1 is very likely much more expensive (and much less useful) than just training it directly on a cheaper model.

1 comments

Because literally before o1, no one is doing COT style test time scaling. It is a new paradigm. The talking point back then, is the LLM hits the wall.

R1's biggest contribution IMO, is R1-Zero, I am fully sold with this they don't need o1's output to be as good. But yeah, o1 is still the herald.

I don't think Chain of Thought in itself was a particularly big deal, honestly. It always seemed like the most obvious way to make AI "work". Just give it some time to think to itself, and then summarize and conclude based on its own responses.

Like, this idea always seemed completely obvious to me, and I figured the only reason why it hadn't been done yet is just because (at the time) models weren't good enough. (So it just caused them to get confused, and it didn't improve results.)

Presumably OpenAI were the first to claim this achievement because they had (at the time) the strongest model (+ enough compute). That doesn't mean COT was a revolutionary idea, because imo it really wasn't. (Again, it was just a matter of having a strong enough model, enough context, enough compute for it to actually work. That's not an academic achievement, just a scaling victory.)

But the longer you allocate tokens to CoT, the better it at solving the problem is a revolutionary idea. And model self correct within its own CoT is first brought out by o1 model.
Chain of Thought was known since 2022 (https://arxiv.org/abs/2201.11903), we just were stuck in a world where we were dumping more data and compute at the training instead of looking at other improvements.
CoT is a common technique, but scaling law of more test time compute on CoT generation, correlates with problem solving performance is from o1.