Hacker News new | ask | show | jobs
by charleshn 44 days ago
Yes, they can.

Some people like to parrot "next token prediction", "LLMs can only interpolate", and other nonsense, but it is obviously not true for many reasons, in particular since we introduced RL.

Humans do not have the monopoly on generating novel ideas, modern AI models using post training, RL etc can come to them in the same way we do, exploration.

See also verifier's law [0]: "The ease of training AI to solve a task is proportional to how verifiable the task is. All tasks that are possible to solve and easy to verify will be solved by AI."

This applied to chess, go, strategy games, and we can now see it applying to mathematics, algorithmic problems, etc.

It is incredibly humbling to see AI outperform humans at creative cognitive tasks, and realise that the bitter lesson [1] applies so generally, but here we are.

[0] https://www.jasonwei.net/blog/asymmetry-of-verification-and-...

[1] http://www.incompleteideas.net/IncIdeas/BitterLesson.html

3 comments

I genuinely start to think that we, as humanity, severely overestimate our cognitive abilities. We act so surprised “just a few years of LLM with a few RL tweaks match our PhD levels! It must be hidden inside our knowledge base!”. Em, what if no? What if our “PhD level” is just very low level comparing to upper boundaries of measurable intelligence? What if we need to learn being humble and stop treating our minds as “sacred source of creativity and intelligence”?
RL or no RL, AI cannot escape the distribution it's trained on. It's just that the labs will put so much into the distribution that we won't be able to tell the difference that easily, nor will it matter for most tasks. The reason AI does well on ARC-AGI-2 is because the labs created synthetic training data using similar puzzles.
Yes it can! That's the whole point of RL! it generates slightly out of distribution rollouts, and rewards good rollouts to change the distribution of the output
That's not out of distributíon, that's inside the distribution of the rollout. If you don't create rollouts for the game of Chess then it doesn't know how to play Chess no matter how smart it is at tasks you've created rollouts for. It's structurally stuck in its distribution.
What if it doesn't need to escape the distribution, it can just exhaust the current distribution we have much more broadly and efficiently than humans can?

So the answers we're seeking to our bleeding edge questions are already there, we just need an AI's ability to target the answers. Then re-train on the improvements and go from there.

Just a thought.

Reinforcement learning for "reasoning" perturbs the model to generate completions in a particular chain of thought / alternative selection structure. It's three next token predictors in a trench coat.
When these things start solving many more long standing problems, and start introducing more novel problems, will people finally admit that the "next token predictor" is not the gotcha they think it is?
It's not a gotcha. It's incredible what these things can do despite being next token predictors from a weird dataset. That's at the heart of the "bitter lesson", and you don't have to believe in magic to see it.
> Some people like to parrot "next token prediction", "LLMs can only interpolate", and other nonsense

Thank you for illustrating my point.