| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tptacek 263 days ago
	That's exactly what Math Academy is: I'm operating with a grounded set of correct, validated content, and using LLMs to (1) fill in more conceptual explanation and (2) check where I went off the rails when I get things wrong. You can't play the "hallucination" card here. An LLM can reliably do partial fraction decomposition, spot and solve an ODE that admits direct integration, calculate an arc length, invert a matrix, or resolve a gnarly web of trig identities. If you say a current frontier model can't do this --- and do it from OCR'd screencaps! --- I'll respond that you haven't tried. I can't think of a single instance where O4 or GPT5 got one of these problems wrong. It sees maybe 6-12 of them per day from me. I've been doing this since February.

1 comments

getnormality 263 days ago

That's very interesting. Maybe you are doing this the right way, and my concern as a math educator is for the people who may struggle to stay on the straight and narrow, or know what the straight and narrow is in this brave new world.

Where I see deficiencies is not so much in the calculations. When a problem class has a solution algorithm and 10,000 worked examples online, I'm not too surprised that the LLM generalizes pretty reliably to that problem class.

The problem I find is more when it's tricky, out-of-distribution, not entirely on the "happy path" of what the 10,000 examples are about. In that case, LLM responses quickly become irrelevant, illogical, and Pavlovian. It's the math version of messing up the surgeon riddle when presented with a minor variation that is logically very easy, but isn't the popular version everyone talks about [1].

[1] https://www.thealgorithmicbridge.com/p/openai-researchers-ha...

link

simonw 263 days ago

The International Mathematical Olympiad challenges should be pretty safely out of distribution. Gemini and OpenAI's best research models both scored gold on that this year.

link

getnormality 263 days ago

When they make a model with those abilities publicly available, I'll happily experiment with it, and I'd anticipate reporting that it is a lot better than what I experienced in the past.

link

simonw 263 days ago

The Gemini one is out now but expensive:

> Gemini Deep Think, our SOTA model with parallel thinking that won the IMO Gold Medal , is now available in the Gemini App for Ultra subscribers!!

https://twitter.com/OfficialLoganK/status/195126226151265943...

link

tptacek 263 days ago

No, we're not going to move the goalposts here. You can tweak any argument so that the thread goes nowhere and nobody can update their mental models by positing a sufficiently misguided user of a piece of technology. I'm saying: LLMs are quite good at math tutoring, in many ways probably significantly better than human tutors (they're tireless, can explain any concept 50 different ways, and can rattle off individualized problem sets in seconds). I made that claim, and you pushed back saying that anything I saw "needed to be validated by an expert". You even said that anything I said was an unreliable narrator because I'm studying math. No, to all of this.

link