| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Mond_ 549 days ago
	> Deepseek could only build this because of o1, I don’t think there’s as much competition as people seem to imply And this is based on what exactly? OpenAI hides the reasoning steps, so training a model on o1 is very likely much more expensive (and much less useful) than just training it directly on a cheaper model.

1 comments

karmasimida 549 days ago

Because literally before o1, no one is doing COT style test time scaling. It is a new paradigm. The talking point back then, is the LLM hits the wall.

R1's biggest contribution IMO, is R1-Zero, I am fully sold with this they don't need o1's output to be as good. But yeah, o1 is still the herald.

link

Mond_ 549 days ago

I don't think Chain of Thought in itself was a particularly big deal, honestly. It always seemed like the most obvious way to make AI "work". Just give it some time to think to itself, and then summarize and conclude based on its own responses.

Like, this idea always seemed completely obvious to me, and I figured the only reason why it hadn't been done yet is just because (at the time) models weren't good enough. (So it just caused them to get confused, and it didn't improve results.)

Presumably OpenAI were the first to claim this achievement because they had (at the time) the strongest model (+ enough compute). That doesn't mean COT was a revolutionary idea, because imo it really wasn't. (Again, it was just a matter of having a strong enough model, enough context, enough compute for it to actually work. That's not an academic achievement, just a scaling victory.)

link

karmasimida 548 days ago

But the longer you allocate tokens to CoT, the better it at solving the problem is a revolutionary idea. And model self correct within its own CoT is first brought out by o1 model.

link

Kubuxu 549 days ago

Chain of Thought was known since 2022 (https://arxiv.org/abs/2201.11903), we just were stuck in a world where we were dumping more data and compute at the training instead of looking at other improvements.

link

karmasimida 548 days ago

CoT is a common technique, but scaling law of more test time compute on CoT generation, correlates with problem solving performance is from o1.

link