Hacker News new | ask | show | jobs
by tptacek 263 days ago
In my experience, it's 100%. Not 95%, not 99%. Unless GPT5 (and O4-mini) were colluding with Math Academy behind the scenes specifically to be wrong about something, it just doesn't get any of this content wrong.

And keep in mind, what it's getting right is trickier than just answering Calc I questions: it's taking an answer I give it, calculating the correct answer itself, selecting its answer over mine, and then spotting where I e.g. forgot to check the domain of a variable inside a log.

1 comments

> In my experience, it's 100%. Not 95%, not 99%.

Yeah, they seem to be there on high school math problems today, there aren't that many variations on them and there are billions of examples of data on them so LLM can saturate those.

Just don't assume they are this reliable on solving real world math tasks yet, those are more varied still and stump models.

They did well at the International Mathematical Olympiad this year.