Hacker News new | ask | show | jobs
by gnulinux 449 days ago
> if I told you 10yrs ago that we'd have AI that can do math/code better than 99% of humans

This not even remotely close to true. Like not even a little bit. I use Cursor and Gemini for work daily and I'd be hard pressed to think AI is a "better" programmer than any professional software engineer. Sure it makes writing code faster and more efficient, because you just click tab and three lines are written for you. It absolutely isn't better than me at coding though.

The claim about math is even more unbelievable than the claim about coding. We still don't have a single theorem proved and published by a LLM without human aid. LLMs barely follow a discussion in basic topology. It's incredibly ridiculous to state they're better than 99% of people. More like 0% of mathematicians and maybe 50% of college freshman.

3 comments

> We still don't have a single theorem proved and published by a LLM without human aid.

I'm pretty sure that by "do math" the parent was referring to applying math, as one would do in the course of other tasks, and not mathematical research, just as by "code" they likely referred to writing code to solve a problem and not to algorithmic research.

And from my experience teaching & tutoring both math and programming at various levels, I would absolutely agree with the claim that AIs like Claude 3.7 Sonnet surpass over 99% of humans at typical short tasks.

It'll probably take some more time until context, memory and tool-use are improved sufficiently to allow AIs to tackle longer-term tasks effectively, but I'm sure it'll get there. And just as an example of progress, there was recently a post about the first "fully AI-generated paper to pass peer review without human edits or interventions" [0].

[0] https://www.rdworldonline.com/sakana-ai-claims-first-fully-a...

The top 50% of college freshman math and physics majors is approximately equal to the top 1% of all people.
I realized today while coding with cursor that AI seems to operate exactly the way I intuit it does, which is it acts like a junior engineer who works by copying existing code but doesn’t understand why. For a lot of tasks that works great, I do this a lot as a senior engineer, but I know when not to. you can’t let it run wild, because it doesn’t know when not too.
> a junior engineer who works by copying existing code but doesn’t understand why

Given the amount of time I have spent fixing code written like this over the years it is not encouraging.