Hacker News new | ask | show | jobs
by tptacek 266 days ago
It's wild to me that, of all the things to call LLMs out for, this piece has chosen to include math tutoring. I've been doing Math Academy for a bit over 6 months now, going from (essentially) Algebra II through Calc II (integration by parts, arc lengths, Taylor expansions) and LLMs have been a huge part of what has made that effective:

* Clear explanation of concepts that respond to questions and reformulate when things bounce

* Step-by-step verification of solutions, spotting exactly where calculations have gone

* Instantaneously generating new problem sets to reinforce concepts

LLMs are probably not going to live up to all sorts of claims their proponents make. But I don't think you can ever have tried to use an LLM in a math course and reach the conclusion that it's "demoware" for that application. At what point, over 6 months of continuous work, does it stop being a "demo"?

10 comments

This https://www.mathacademy.com/ ? Interesting, hadn't seen that before. I've been thinking I'd like to brush up on a bunch of those topics.
Wholeheartedly recommend it, just remember we're not the core market for it (that's high school students, though the curricula goes all the way through the normal college math sequence).

Minutes later

In case I've spooked anyone, they have an adult course series (Foundations I, II, and III) that's accelerated by trimming out all the material their authors believe are important only for things like school placement exams; the modal adult Math Academy person is doing I, II, and III as a leadup to their Math for Machine Learning course, which is linear algebra and multivariable calc.

I think it's one of the three most mindblowing learning resources I've ever used. One of the other three: Lingua Latina Familia Romana. In both cases, I have the uncanny certainty that I am operating at the limit of my ability to acquire and retain new information, which is a fun place to be.

Generating problems is fantastic, but I'd caution on overreliance in the other two cases.

Basically all of the cognitive science literature on learning that I am aware of says that the more you do directly and the less hand holding you are given, the better your acquisition and long term retention. In particular, having the LLM elaborate concepts for you is probably one of the worst things you can do when it comes to learning. Struggling through that elaboration process yourself is going to make the learning stick much more strongly, at least if all of the research is to be believed.

I understand that. The core of the pedagogical approach here is Math Academy, not LLMs. (Math Academy isn't an LLM; it's a spaced-repetition accelerated curriculum centered on graded problem set submissions). But the LLM functions exactly the way a tutor would in a math course, and for that application, LLMs have become extremely effective; arguably more effective than human tutors.
It seems very hard to maintain the belief that LLMs are useless in the face of the fact that millions of people are using them. It's very much "nobody goes there anymore, it's too crowded"
I think you'd be crazy to say LLMs are blockchain-style hype when it comes to software development but I don't begrudge anybody who believes they're not currently workable for the kinds of problems they work on; I think reasonable people can disagree about how ready for prime time they are for production software development.

But for math tutoring? If you claim LLM math tutoring is demoware, you're very clearly telling on yourself.

I wouldn't trust the LLM's raw output to be correct, but math is provable and if there was a filter between the LLM's output (which would be in some more rigid/structured format, not free form text) and whatever the user sees that tries to prove the LLM's output is correct (and try again if it goes wrong[0]), then i can see LLMs being perfectly fine for that.

In fact i'd say in general anything that LLMs produce that can be "statically checked" in some way, can be fine to rely on. You most likely need more than a chat interface though, but i think in general it is plausible for such solutions to exit.

[0] hopefully it wont end up always failing, ending up in an infinite loop :-P

(OP) In my post, I actually ask the question of whether a student would _want_ to interact with the tutor, not if the tutor is capable of providing good instruction. These are drastically different critiques.
I have seen LLMs fabricate bogus calculations; I personally would be hesitant to use an LLM as my one and only source of math learning, but I suppose using it in conjunction with something like Math Academy mitigates that issue? You've clearly had good success here, but any problem areas with the LLM to watch out for?
On that basis you'll also be adopting TCM, and homeopathy, and dowsing, and all major religions simultaneously, and all major fad diets simultaneously? Like, "lots of people like this thing and think it is helping them" is not terribly strong evidence that it is actually helping them. It's not a good argument.
How about "lots of people like this thing" where many of those people are credible professionals who I have respected from LONG before they started using LLMs?
Again, it's not _hugely_ strong evidence. Linus Pauling won _two_ Nobel prizes, was unquestionably brilliant... but that doesn't mean I'm going to start megadosing vitamin C anytime soon (https://en.wikipedia.org/wiki/Linus_Pauling#Medical_research...).

(See also Newton and alchemy, and the list goes on.)

The experts I respect on this are people who's expertise is in software development, so when they say "this stuff helps me do what I do better and faster" I trust them more than if they said "I've started megadosing vitamin C and it's amazing".

It helps that what they're discovering matches my own personal experience as well.

there are just as many equally-qualified experts who have been consistently giving exactly the opposite feedback/signal, which you never seem to acknowledge or incorporate into your comments
Offtopic, but do you have any comparison to math academy to something like Khan, or other platforms? MA seems a bit expensive for someone just wanting to improve a general skill, but perhaps it's well worth it? I thought Khan was also investing in similar AI offerings, so i'm curious how they intersect
Khan never clicked for me, and while the cost of Math Academy is below my noise floor (when you back it out to $/hr of engagement) as an adult professional in his prime earning years, I should also add that the cost is also a motivator: I've never been tempted to take a break, in part because I'm on the meter.
While I agree, on an unrelated note - I knew I know your nick from somewhere...

And then I realized[0].

[0] https://ludic.mataroa.blog/blog/contra-ptaceks-terrible-arti...

I had a conversation with that person a couple weeks ago. They're nice. I think we both would tweak (if just a little bit) how we presented our articles with the benefit of hindsight.

For the record, I'm a systems programmer and a security person and I don't work for an AI company (you can Six Degrees of Sam Altman any startup to AI now if you want to make the claim, but if you try I'm just going to say "Sir, This Is A Wendy's".)

> I think we both would tweak (if just a little bit) how we presented our articles with the benefit of hindsight.

Maybe you could present a joint statement of some kind, that would be interesting. I enjoy listening to the arguments of both camps and constantly comparing them to the actual state of things - and my conclusion is, sorry for the cliche, the only constant is change.

I on the other hand am the mental case that drinks rocket fuel.
Absolutely.

This piece feels like a “I tried it out how I could” piece vs “I spent time learning how others are learning math with LLMs too”

LLMs will make meaningful advances in personalized learning.

Some of the frameworks might evolve along the way.

So .. a person who doesn't know X, is using LLMs to learn X, yet is able to judge that LLMs are doing a good job at teaching X, even though the person doesn't know X?
There are many, many things in life where you can evaluate if you are learning the thing despite not having access to an expert guide who can verify what you are learning.

Cooking: does the food taste better as you learn more?

Programming: are you able to build functioning software that does what you want it to do, better than you could earlier on in your path?

Fixing a broken dishwasher: does the dishwasher work again now?

The idea that learning only works if you have an expert on hand to verify that you are learning is one of those things that seems obviously true until you think harder about it.

You're confused. Math Academy is not an LLM.
Is it always correct?
In my experience, it's 100%. Not 95%, not 99%. Unless GPT5 (and O4-mini) were colluding with Math Academy behind the scenes specifically to be wrong about something, it just doesn't get any of this content wrong.

And keep in mind, what it's getting right is trickier than just answering Calc I questions: it's taking an answer I give it, calculating the correct answer itself, selecting its answer over mine, and then spotting where I e.g. forgot to check the domain of a variable inside a log.

> In my experience, it's 100%. Not 95%, not 99%.

Yeah, they seem to be there on high school math problems today, there aren't that many variations on them and there are billions of examples of data on them so LLM can saturate those.

Just don't assume they are this reliable on solving real world math tasks yet, those are more varied still and stump models.

They did well at the International Mathematical Olympiad this year.
I've used LLMs to try to help digest some advanced maths. Eg. "Explain the number field seive with lots of numeric examples".

Yes the numeric examples often don't work. The consequences of this though are similar to a failed web search. As in it's not a big deal and when it does work it's very helpful.

Maths is one of those things with so much objectivity that even the LLM usually realizes it has failed to create a numeric example. "Here the numeric example breaks down since we cannot find a congruence of squares in this example without finding more B-smooth numbers in step 1". Ok that's a shame, i would have loved to see an end to end numeric example.

I think people get too hung up on any possibility of LLMs not being perfect while still being extremely helpful.

A LLM can't "realize" anything. Unless you are saying that LLMs are aware.
It's a term i used to explain that in 'thinking' mode LLMs will read their own output and call out things like incorrect math statements before posting to the user.

Now you probably want a debate about the term 'thinking' mode but i cbf with that. It's pretty clear what was meant and semantic arguments suck. Don't do that.

I want people to use correct terms, i don't think that is unreasonable.
I'm all for avoiding anthropomorphism of these things, but what word (or set of words) would you use instead?
It's nice that you think it's clear and responsive, but I think it [1] needs to be validated by an expert in both the material and education. Or we need some way to show that people have actually learned the topic. People sometimes prefer explanations that are intuitive and familiar but not accurate.

Meanwhile, there are math education resources like iXL that maybe cost a little money but the lessons and practice problems are fully curated by human experts (AFAICT). I'm not saying these resources are perfect either, but as a mathematician who has experimented a lot with LLMs, including in supposed tutoring modes, they make a lot of mistakes and take a lot of shortcuts that should materially decrease their effectiveness as tutors.

[1] LLM-based tutoring (edit: footnote added to clarify)

That's exactly what Math Academy is: I'm operating with a grounded set of correct, validated content, and using LLMs to (1) fill in more conceptual explanation and (2) check where I went off the rails when I get things wrong. You can't play the "hallucination" card here. An LLM can reliably do partial fraction decomposition, spot and solve an ODE that admits direct integration, calculate an arc length, invert a matrix, or resolve a gnarly web of trig identities. If you say a current frontier model can't do this --- and do it from OCR'd screencaps! --- I'll respond that you haven't tried.

I can't think of a single instance where O4 or GPT5 got one of these problems wrong. It sees maybe 6-12 of them per day from me. I've been doing this since February.

That's very interesting. Maybe you are doing this the right way, and my concern as a math educator is for the people who may struggle to stay on the straight and narrow, or know what the straight and narrow is in this brave new world.

Where I see deficiencies is not so much in the calculations. When a problem class has a solution algorithm and 10,000 worked examples online, I'm not too surprised that the LLM generalizes pretty reliably to that problem class.

The problem I find is more when it's tricky, out-of-distribution, not entirely on the "happy path" of what the 10,000 examples are about. In that case, LLM responses quickly become irrelevant, illogical, and Pavlovian. It's the math version of messing up the surgeon riddle when presented with a minor variation that is logically very easy, but isn't the popular version everyone talks about [1].

[1] https://www.thealgorithmicbridge.com/p/openai-researchers-ha...

The International Mathematical Olympiad challenges should be pretty safely out of distribution. Gemini and OpenAI's best research models both scored gold on that this year.
When they make a model with those abilities publicly available, I'll happily experiment with it, and I'd anticipate reporting that it is a lot better than what I experienced in the past.
The Gemini one is out now but expensive:

> Gemini Deep Think, our SOTA model with parallel thinking that won the IMO Gold Medal , is now available in the Gemini App for Ultra subscribers!!

https://twitter.com/OfficialLoganK/status/195126226151265943...

No, we're not going to move the goalposts here. You can tweak any argument so that the thread goes nowhere and nobody can update their mental models by positing a sufficiently misguided user of a piece of technology. I'm saying: LLMs are quite good at math tutoring, in many ways probably significantly better than human tutors (they're tireless, can explain any concept 50 different ways, and can rattle off individualized problem sets in seconds). I made that claim, and you pushed back saying that anything I saw "needed to be validated by an expert". You even said that anything I said was an unreliable narrator because I'm studying math. No, to all of this.
What makes you think https://www.mathacademy.com/faq hadn't been evaluated by experts?

That appears to be their whole thing, and they've been in business for longer than LLMs have been around.

I think before that question is useful to ask, we have to know if that FAQ even says anything about LLM-based tutoring. After a few minutes of research, I can't find any evidence that Math Academy offers LLM-based tutoring.
This was linked from the homepage: https://www.mathacademy.com/how-our-ai-works

But more importantly if tptacek says they use LLMs and is a user of the platform that's good enough for me.

I'm using LLMs alongside Math Academy. Math Academy uses machine learning generally (and so now they plug their "AI" technology) but it's not transformer-model-style AI ML; as I understand it, it's just driving their underlying spaced repetition system (which is interleaved through lots of different units).

In the scenario I'm discussing, Math Academy's content is a non-generative source of truth, against which I've benchmarked GPT5 and O4-mini.

Everything described there sounds like old-school adaptive algorithms. I don't see anything about generative AI or LLMs.

I asked Google if MA does LLM tutoring and got back this answer:

> Math Academy does not offer Large Language Model (LLM) tutoring. While the company advertises itself as "AI-powered," this is in reference to a machine-learning-based adaptive learning system, not an interactive LLM tutor.

And here is a HN comment that indicates LLMs are a complement to MA, not part of it: https://news.ycombinator.com/item?id=43281240

You're right, I may have misinterpreted what tptacek said: he said he was using LLMs and that he was using Math Academy but I interpreted that as "Math Academy includes LLM features" - actually it's equally likely he's using Math Academy and having LLMs tutor him on the side.

(Confirmed I got this wrong: https://news.ycombinator.com/item?id=45439001)

You're confused. Math Academy isn't LLM-based. I use an LLM alongside it.
I think parent was clearly referring to LLM use, and not math academy.
I agree that LLM output need to be validated to be valuable but math (unless it's on a quite high level I suppose) seems like one of the areas with the most potential for doing validations, without requiring an expert to validate everything.

If you're working on educational math problems with solutions you can validate against the solutions. If you're working with proofs you can evaluate the proofs in a proof checker. Or you can run the resulting math expressions through a calculator.

There is a bit of oversimplification here.

Understanding if the student has actually learned is a competency piece, in math it’s mostly show your work and/or did you have the right answer.

The continued top down attempts to boil the whole sea with LLMs is part of the current problem.

It’s getting pretty good though for focused tutoring.

For students, models setup to tutor too often are trying to boil a sea (all education) instead of a kiddie pool. The reality is that more and more seems like k-6 if not k-12 students can be supported.

If we look at the EdTech space from the bottom up, namely learner-centric, there is both a real need and opportunity.

For school age students, math largely has not changed in hundreds of years, and doesn’t change often. Either you understand it or have to put in the work.

There’s no shortage of human created written teaching resources. A teacher could create their own touring assistant based off their explanations.

Alternatively, an open source textbook could be inputted. There’s a reason why training or fine tuning on books has caused lawsuits - it can increase accuracy many fold.

Teachers are burdened with repetitive marking, there’s def a place for personalized marking tools.

We know LLMs respond differently to different input. Their superpower is being able to regenerate an input as many different many different ways, which can include personalization.

Just because one has experimented with LLMs doesn’t mean there isn’t a way to get a result from them just because we haven’t been able to understand how.

If examples of the chat logs or prompts can be provided of what did or didn’t work it helps have a conversation without the subjectivity.

Mathematics is a great lens to see that folks are trying to get non-deterministic software to behave like all the deterministic software we’ve had before, instead of finding the places where non-deterministic strengths can shine.

It’s not all or nothing, or one or the other.

>I think it needs to be validated by an expert in both the material and education

LLMs getting it wrong is terrible when it matters but i also don't think it's a huge problem when it comes to acting as an additional resource to learning. Here the parent is using a lesson plan that costs money and using LLM for a little more explanation. It's similar to using web search on a topic and sometimes you get a hit, sometimes you don't.

Asking LLMs for numeric examples of complex maths sometimes fails. It's easy to spot and no great loss. When it works though it's extremely helpful to follow through.

Not sure the condescending tone is really necessary. I’d agree with you if the parent comment was saying they asked an LLM to create a math curriculum and problems for them. But they’re using an established app created by a math major and then using LLMs to ask questions. It’s easier to validate the responses you get back in those cases.
I think students are not a reliable source of information about the effectiveness of LLM tutoring. There is no 100% nice way to say this, but I did my best. You're free to disagree, but I think the tone criticism is off-base.
I agree with you completely. People mistake the impression of learning for learning itself super easily. This is why we have examinations and other tests of mastery, after all. I think using LLMs for generating exams or supplementary material is great, but using them to develop accurate understanding that would actually turn into long term retention seems dubious to me.
We found our way to "No True Math Student". I love it!
It’s interesting how people insist math requires expert validation when it’s literally the most self validating subject there is. The instinct to gatekeep even something as mechanistically checkable as algebra says more about insecurity in education than it does about rigor.
Wanting an actual check on the device that is notorious for making things up is gatekeeping now?
You’re projecting a bad faith use case that the original commenter never described. they’re using it in a exploratory and iterative way, not deferential.
If you're using it for education it is by definition deferential.
No it isn't. Again, what's happening here I think is that this thread doesn't understand what Math Academy is. It's not an LLM. I'm using the LLM alongside it.
"5.11 or 5.9 which number is greater?" was a meme query a few months ago to ask an LLM as it would confidenly prove how 5.11 is greater - so yes, we do need expert validation!
A very, very big problem we have with LLM discourse is that LLMs have changed radically since the beginning of last year. If you're making an argument about modern foundation models based on the idea that they can't generate reliably correct answers to whether 5.11 is greater than 5.9, your mental model is completely out of date.

You don't have to believe me on this, just your own lying eyes. Go try this for yourself right now: ask it dy/dx of h(x)/g(x) where h(x) is x^3 + 1 and g(x) is -2e^x. That's a random Math Academy review problem I did last night that I pulled out of Notes.app. Go look.

I think you’re misreading the situation. the original commenter isn’t outsourcing thinking, they’re using the tool to probe and test ideas, not to blindly accept end result answers which LLMs are (currently) not to be blindly trusted.
Isn't this moving the goalposts? It's great that you're learning but MathAcademy appears to be a whole product that may incorporate an LLM but is much more, and it's a paid product none of us can evaluate. It's not possible to tell from looking at their site, or from your comment, what content is generated, or how it is verified before being used as teaching material.

There are probably smart ways to incorporate LLM output into an application like the one you're lauding but your comment is a little like responding "but my cake tastes good" to someone who says you shouldn't eat raw flour.

You're confused. Math Academy isn't LLM-based.