Hacker News new | ask | show | jobs
by crypto420 386 days ago
All of this, as well as the crazy weird behaviors by o3 around it's hallucinations and Claude on deceiving users - is pointing to an interesting quote I saw about scaling RL in LLMs: https://x.com/jxmnop/status/1922078186864566491

"the AI labs spent a few years quietly scaling up supervised learning, where the best-case outcome was obvious: an excellent simulator of human text

now they are scaling up reinforcement learning, which is something fundamentally different. and no one knows what happens next"

I tend to believe this. AlphaGo and AlphaZero, which were both trained with RL at scale, led to strategies that have never been seen before. They were also highly specialized neural networks for a very specific task, which is quite different from LLMs, which are quite general in their capabilities. Scaling RL on LLMs could lead to models that have very unpredictable behaviors and properties on a variety of tasks.

This is all going to sound rather hyperbolic - but I think we're living in quite unprecedented times, and I am starting to believe Kurzweil's vision of the Singularity. The next 10-20 years are going to be very unpredictable. I don't quite know what the answer will be, but I believe scaling mechanistic interpretability will probably yield some breakthroughs into how these models approach problems.