|
|
|
|
|
by djhn
40 days ago
|
|
I think I know the examples you’re talking about. They don’t show much in terms of reasoning. The Erdős problems have turned out to be largely brute force or finding older results. The Feb 2026 GPT-5.2 theoretical physics paper was a result of “dialogue between physicists and LLMs”, called “grad student level” by experts in the field, used a “custom harnessed” “internal OpenAI” model with “20 hours of reasoning”. Quotes from OpenAI blog. The Matthew Schwartz physics paper with Claude this March involved “51,248 messages across 270 sessions, producing over 110 draft versions and consuming 36 million tokens”, and the actual contribution was Schwartz finding an error in Claude’s solution. |
|