|
|
|
|
|
by NitpickLawyer
221 days ago
|
|
a) no, gemini 2.5 was shown to "win" gold w/o tools. - https://arxiv.org/html/2507.15855v1 b) reductionism isn't worth our time. Planning works in the real world, today. (try any agentic tool like cc/codex/whatever). And if you're set on the purist view, there's mounting evidence from anthropic that there is planning in the core of an LLM. c) so ... not true? Long context works today. This is simply moving goalposts and nothing more. X can't do Y -> well, here they are doing Y -> well, not like that. |
|
b) Next-token training doesn’t magically grant inner long-horizon planners..
c) Long context ≠ robust at any length. Degradation with scale remains.
Not moving goalposts, just keeping terms precise.