Hacker News new | ask | show | jobs
by tptacek 263 days ago
That's exactly what Math Academy is: I'm operating with a grounded set of correct, validated content, and using LLMs to (1) fill in more conceptual explanation and (2) check where I went off the rails when I get things wrong. You can't play the "hallucination" card here. An LLM can reliably do partial fraction decomposition, spot and solve an ODE that admits direct integration, calculate an arc length, invert a matrix, or resolve a gnarly web of trig identities. If you say a current frontier model can't do this --- and do it from OCR'd screencaps! --- I'll respond that you haven't tried.

I can't think of a single instance where O4 or GPT5 got one of these problems wrong. It sees maybe 6-12 of them per day from me. I've been doing this since February.

1 comments

That's very interesting. Maybe you are doing this the right way, and my concern as a math educator is for the people who may struggle to stay on the straight and narrow, or know what the straight and narrow is in this brave new world.

Where I see deficiencies is not so much in the calculations. When a problem class has a solution algorithm and 10,000 worked examples online, I'm not too surprised that the LLM generalizes pretty reliably to that problem class.

The problem I find is more when it's tricky, out-of-distribution, not entirely on the "happy path" of what the 10,000 examples are about. In that case, LLM responses quickly become irrelevant, illogical, and Pavlovian. It's the math version of messing up the surgeon riddle when presented with a minor variation that is logically very easy, but isn't the popular version everyone talks about [1].

[1] https://www.thealgorithmicbridge.com/p/openai-researchers-ha...

The International Mathematical Olympiad challenges should be pretty safely out of distribution. Gemini and OpenAI's best research models both scored gold on that this year.
When they make a model with those abilities publicly available, I'll happily experiment with it, and I'd anticipate reporting that it is a lot better than what I experienced in the past.
The Gemini one is out now but expensive:

> Gemini Deep Think, our SOTA model with parallel thinking that won the IMO Gold Medal , is now available in the Gemini App for Ultra subscribers!!

https://twitter.com/OfficialLoganK/status/195126226151265943...

No, we're not going to move the goalposts here. You can tweak any argument so that the thread goes nowhere and nobody can update their mental models by positing a sufficiently misguided user of a piece of technology. I'm saying: LLMs are quite good at math tutoring, in many ways probably significantly better than human tutors (they're tireless, can explain any concept 50 different ways, and can rattle off individualized problem sets in seconds). I made that claim, and you pushed back saying that anything I saw "needed to be validated by an expert". You even said that anything I said was an unreliable narrator because I'm studying math. No, to all of this.