| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by HarHarVeryFunny 297 days ago

There are basically three types of responses you can get from an LLM/agent:

1) A response originating from LLM pre-training, in a domain where there has not been any (successful) Rl-for-reasoning post-training. In this case the amount of reasoning around the raw facts "recalled" by the LLM is going to be limited by any reasoning present in the training data.

2) A non-agentic response in a domain like Math Olmypiad problems where the LLM was post-trained with RL to encourage reasoning mirroring this RL training set. This type of domain-specific reasoning training seems to have little benefit to other domains (although in the early LLM days it was said that training on computer code did provide some general benefit).

3) An agentic response, such as from one of these research systems, where it seems the agent is following some sort of generic research / summarization template with proscribed steps. I've never tried these myself, but it seems they can be quite successful in deep diving and gathering relevant source material, but then the ability to reason over this retrieved material is going to come down to the reasoning capability of the underlying model per 1) and 2) above.

Bottom line would seem to be that with today's systems domain specific reasoning capability largely comes down to RL post-training for reasoning in that specific domain, resulting in what some call "jagged" performance - excellent in some areas and very poor in others. Demis Hassabis, for one, seems to be saying that this will not be fixed until architectural changes/additions are made to bring us closer to AGI.