|
|
|
|
|
by lossolo
333 days ago
|
|
In the context of our conversation and what OP wrote, there has been no breakthrough since around 2018. What you're seeing is the harvesting of all low-hanging fruit from a tree that was discovered years ago. But fruit is almost gone. All top models perform at almost the same level. All the "agents" and "reasoning models" are just products of training data. I wrote more about it here: https://news.ycombinator.com/item?id=44426993 You may also be interested in this article, that goes into details even more: https://blog.jxmo.io/p/there-are-no-new-ideas-in-ai-only |
|
AI researchers spent years figuring out how to apply RL to LLMs without degrading their general capabilities. That's the breakthrough. Not the existence of RL, but making it work for LLMs specifically. Saying "it's just RL, we've known about that for ages" does not acknowledge the work that went into this.
Similarly, using the fact that new breakthroughs look like old research ideas is not particularly good evidence that we are going to head into a winter. First, what are the limits of RL, really? Will we just get models that are highly performant at narrow tasks? Or will the skills we train LLMs for generalise? What's the limit? This is still an open question. RL for narrow domains like Chess yielded superhuman results, and I am interested to see how far we will get with it for LLMs.
This also ignores active research that has been yielding great results, such as AlphaEvolve. This isn't a new idea either, but does that really matter? They figured out how to apply evolutionary algorithms with LLMs to improve code. So, there's another idea to add to your list of old ideas. What's to say there aren't more old ideas that will pop up when people figure out how to apply them?
Maybe we will add a search layer with MCTS on top of LLMs to allow progress on really large math problems by breaking them down into a graph of sub-problems. That wouldn't be a new idea either. Or we'll figure out how to train better reranking algorithms to sort our training data, to get better performance. That wouldn't be new either! Or we'll just develop more and better tools for LLMs to call. There's going to be a limit at some point, but I am not convinced by your argument that we have reached peak LLM.