| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pjc50 264 days ago
	Is it always correct?

2 comments

tptacek 264 days ago

In my experience, it's 100%. Not 95%, not 99%. Unless GPT5 (and O4-mini) were colluding with Math Academy behind the scenes specifically to be wrong about something, it just doesn't get any of this content wrong.

And keep in mind, what it's getting right is trickier than just answering Calc I questions: it's taking an answer I give it, calculating the correct answer itself, selecting its answer over mine, and then spotting where I e.g. forgot to check the domain of a variable inside a log.

link

Jensson 264 days ago

> In my experience, it's 100%. Not 95%, not 99%.

Yeah, they seem to be there on high school math problems today, there aren't that many variations on them and there are billions of examples of data on them so LLM can saturate those.

Just don't assume they are this reliable on solving real world math tasks yet, those are more varied still and stump models.

link

simonw 264 days ago

They did well at the International Mathematical Olympiad this year.

link

AnotherGoodName 264 days ago

I've used LLMs to try to help digest some advanced maths. Eg. "Explain the number field seive with lots of numeric examples".

Yes the numeric examples often don't work. The consequences of this though are similar to a failed web search. As in it's not a big deal and when it does work it's very helpful.

Maths is one of those things with so much objectivity that even the LLM usually realizes it has failed to create a numeric example. "Here the numeric example breaks down since we cannot find a congruence of squares in this example without finding more B-smooth numbers in step 1". Ok that's a shame, i would have loved to see an end to end numeric example.

I think people get too hung up on any possibility of LLMs not being perfect while still being extremely helpful.

link

lomase 264 days ago

A LLM can't "realize" anything. Unless you are saying that LLMs are aware.

link

AnotherGoodName 264 days ago

It's a term i used to explain that in 'thinking' mode LLMs will read their own output and call out things like incorrect math statements before posting to the user.

Now you probably want a debate about the term 'thinking' mode but i cbf with that. It's pretty clear what was meant and semantic arguments suck. Don't do that.

link

lomase 264 days ago

I want people to use correct terms, i don't think that is unreasonable.

link

chaps 264 days ago

I'm all for avoiding anthropomorphism of these things, but what word (or set of words) would you use instead?

link