| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by littlestymaar 455 days ago

What makes you think so?

From ChatGPT 3.5 to o1, all LLMs progress came from investment in training: either by using much more data, or using higher quality data thanks to artificial data.

o1 (and then o3) broke this paradigm by applying a novel idea (RL+search on CoT) and that's because of it that it was able to make progress on ARC-AGI.

So IMO the success of o3 goes in favor of the argument of how we are in an idea-constrained environment.

1 comments

torginus 455 days ago

This isn't a novel idea - some people tried the exact same thing the day GPT4 came out.

And going back even further, there's Goal Oriented Action Planning - an old timey video game AI technique, that's basically searching through solution space to construct a plan:

https://medium.com/@vedantchaudhari/goal-oriented-action-pla...

(besides the fact that almost all old timey AI is state space solution search)

link

littlestymaar 454 days ago

What's new is to apply that to LLMs, that is.

> This isn't a novel idea - some people tried the exact same thing the day GPT4 came out.

What do you mean? Since GPT4's weights aren't available, you can't run RL on it by yourself. Only OpenAI can.

link