| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by falcor84 495 days ago

> We still don't have a single theorem proved and published by a LLM without human aid.

I'm pretty sure that by "do math" the parent was referring to applying math, as one would do in the course of other tasks, and not mathematical research, just as by "code" they likely referred to writing code to solve a problem and not to algorithmic research.

And from my experience teaching & tutoring both math and programming at various levels, I would absolutely agree with the claim that AIs like Claude 3.7 Sonnet surpass over 99% of humans at typical short tasks.

It'll probably take some more time until context, memory and tool-use are improved sufficiently to allow AIs to tackle longer-term tasks effectively, but I'm sure it'll get there. And just as an example of progress, there was recently a post about the first "fully AI-generated paper to pass peer review without human edits or interventions" [0].

[0] https://www.rdworldonline.com/sakana-ai-claims-first-fully-a...