I don't think it will settle things even if we did manage to train an 1800 LLM with sufficient size.
LLMs are blank slates (like an uncultured primitive human being - albeit LLM comes with knowledge built-in, but builtin knowledge is mostly irrelevant here). LLM output is purely a function of the input (context), so agentic systems' capabilities do not equal underlying LLM's capabilities.
If you ask such an LLM "overturn Newtonian physics, come up with a better theory", of course the LLM won't give you relativity just like that. The same way an uneducated human has no chance of coming up with relativity either.
However, ask it this:
```
You are Einstein ...
<omitted: 10 million tokens establishing Einstein's early life and learnings>
... Recent experiments have put these ideas to doubt, ...<another bunch of tokens explaining the Michelson–Morley experiment>... Any idea why this occurs?
```
and provide it with tools to find books, speak with others, run experiments, etc. Conceivably, the result will be different.
Again, we pretty much see this play out in coding agents:
Claude the LLM has no prior knowledge of my codebase so of course it has zero chance of solving a bug in it. Claude 4 is a blank slate.
Claude Code the agentic system can:
- look at a screenshot.
- know what the overarching goal is from past interactions & various documentation it has generated about the codebase, as well as higher-level docs describing the company and products.
- realize the screenshot is showing a problem with the program.
- form hypothesis / ideate why the bug occurs.
- verify hypotheses by observing the world ("the world" to Claude Code is the codebase it lives in, so by "observing" I mean it reads the code).
- run experiments: modify code then run a type check or unit test (although usually the final observation is outsourced to me, so I am the AI's tool as much as the other way around.)
LLMs are blank slates (like an uncultured primitive human being - albeit LLM comes with knowledge built-in, but builtin knowledge is mostly irrelevant here). LLM output is purely a function of the input (context), so agentic systems' capabilities do not equal underlying LLM's capabilities.
If you ask such an LLM "overturn Newtonian physics, come up with a better theory", of course the LLM won't give you relativity just like that. The same way an uneducated human has no chance of coming up with relativity either.
However, ask it this:
``` You are Einstein ... <omitted: 10 million tokens establishing Einstein's early life and learnings> ... Recent experiments have put these ideas to doubt, ...<another bunch of tokens explaining the Michelson–Morley experiment>... Any idea why this occurs? ```
and provide it with tools to find books, speak with others, run experiments, etc. Conceivably, the result will be different.
Again, we pretty much see this play out in coding agents:
Claude the LLM has no prior knowledge of my codebase so of course it has zero chance of solving a bug in it. Claude 4 is a blank slate.
Claude Code the agentic system can:
- look at a screenshot.
- know what the overarching goal is from past interactions & various documentation it has generated about the codebase, as well as higher-level docs describing the company and products.
- realize the screenshot is showing a problem with the program.
- form hypothesis / ideate why the bug occurs.
- verify hypotheses by observing the world ("the world" to Claude Code is the codebase it lives in, so by "observing" I mean it reads the code).
- run experiments: modify code then run a type check or unit test (although usually the final observation is outsourced to me, so I am the AI's tool as much as the other way around.)