Hacker News new | ask | show | jobs
by baq 3 hours ago
AI used to write homework should be banned.

AI in 1:1 tutor mode with proper hardware (live scanning pen and paper), harness and guardrails should be wildly successful (in terms of education outcomes) especially in elementary school.

6 comments

Disagree. AI has no business being used in 1:1 tutor mode before the hallucination and sycophancy issues are completely resolved. As is, I can easily see it being a hindrance to building actual understanding.

Just one example - it's very common to see ChatGPT and the like respond with "you're absolutely correct! Great insight" to something that is a complete misunderstanding.

This is specifically a consumer model (or specifically ChatGPT) issue. e.g. IME codex does not do this, and will just tell you when you're missing something or somehow wrong, and Gemini does this weird thing where it tells you you're a genius and then immediately starts correcting everything you said.
Sycophancy is just one aspect of the problems I mentioned, though. Another huge one is hallucination, and one that is actually far worse than I thought:

> It’s been proven that when a model is trained on large volumes of highly factual and non-theoretical data, it learns to always have an answer. DeepSeek V4 Pro (1.6T params, 49B active, 44 AA Intelligence Index score) has a ludicrous 94% hallucination score on the AA-Omniscience benchmark, meaning on questions that it couldn’t figure out, it only stated that it didn’t know around 6% of the time, and the rest it confidently hallucinated an answer. GLM-5.2 scored a 28% hallucination rate, Opus 4.8 was 36%, Fable 5 was 48%, and GPT-5.5 was 86%.

https://arrowtsx.dev/bigger-models/

I think even a 5% hallucination rate would be terrible for a teacher, who should generally be comfortable with saying "I don't know off the top of my head but here is how to find resources to answer your question".

---

So, just to drive the point home, Codex has an 86.9% hallucination rate on the AA-omniscience score in this index https://benchlm.ai/models/gpt-5-3-codex - if you ask it something that wasn't sufficiently covered in its training data, it will confidently make up an answer nearly 87% of the time.

While you might think it is happy to correct you when you are wrong, you don't know that for sure since you don't know when you're wrong. Codex may have been happily agreeing with you about things you had completely backwards.

Except I generally do know when I'm wrong because I'm working in a domain I am familiar with, and it will often create experiments on the fly unprompted (well, prompted, but generically in AGENTS.MD) to check itself. My experience actually using it for software is that it almost never makes up answers.
Just realized 1:1 AI is 90s self-esteem medals-for-everyone parenting on steroids.
Teachers hallucinate too. I’ve had creationists and communists and tin-foil-hat (chem trails, 5g, etc) teachers. Surely you can imagine an AI tutor that is higher than zero ROI.
> I’ve had creationists and communists and tin-foil-hat (chem trails, 5g, etc) teachers.

I certainly have, too, but there is still a difference between a person who has a factually incorrect but consistent worldview and an LLM which simply reflects the worldview of the user or even changes between queries.

I don't think creationists have any business being in schools either, for what it's worth, but I think it's easier for a teenager to sort out "Mr. Smith has no clue what he's talking about" vs "I have no clue what's true because the LLM everyone expects me to learn from just confirms everything I ask regardless of what I'm asking".

A bit part of education is (should be) independent learning with textbooks and reading. You don't need to be "tutored".
That’s rather disingenuous. But it seems nowadays that words have lost meaning… so, I don’t blame you. I blame the LLMs for this deterioration.
lol scraping the bottom of the barrel
> AI in 1:1 tutor mode with proper hardware, harness and guardrails should be wildly successful

I’m open to the idea! Show me the evidence. Then we can roll it out to our kids.

“AI adoption raises homework scores by 18% and reduces completion time by 30%, but lowers monthly exam scores by 20% within six months. High-stakes entrance-exam scores fall by 18 and 24%, with the full penalty emerging only after about two years.”

Yup. Short-term metrics juice. Actual comprehension and cognition falls. This seems to be the case across the board, including with adults.

I’m genuinely optimistic that there is a way to make AI helpful in education. I just don’t think we’ve found it yet. (We certainly haven’t demonstrated it.)

> reduces completion time by 30%

This is probably the big problem, or at least one of them.. If you use less time on learning, it will probably be harder to remember what you learned also. We need to spend some time to make it stick

The behavioral issue I see is that LLM users tend to immediately reach for an LLM and do their thinking in concert with it.

This tempts users to approach problems by first feeding them into the LLM and then simply following the route the LLM lays out, which does improve task completion time for tasks that the LLM can simply regurgitate, but it stops the user from developing the actual critical thinking skills that school is supposed to teach.

It’s not just critical thinking skills, it’s also that there’s a big difference between recognition/following instructions, and recall/generating your own memories of an approach. But most students don’t recognize the difference. In other words, “following the route” is a big part of the problem - it doesn’t engage the brain the same and isn’t representative of real world use, and having something explained well doesn’t mean you can in turn explain it well yourself (the more revealing test of internalized true understanding)
Can agree on that.

The description of the paper also said:

AI users who maintain similar homework completion time as non-AI users experience small learning losses.

This was a surprise too me. I would have thought otherwise.

Would love to see some evidence about if more or less people fall behind and have worse results. In my head the AI should be able to get the weakest students a bit highere.

I think AI could (and by some students probably already is) be used to help a student better understand the material, and faster than you could before. I still recall some parts of Physics taking a while to click, and often having to reread different sections of a textbook to try and understand the what and why behind something.

The biggest issue is a child has to want to do that, since they also could just ask the AI for the answer and then go back to playing video games. End of the day past age 13 or so I just don't see legislation making any difference, they'll find a way past any law blocking them from using AI. Like a lot of education it'll probably come down to parenting.

> I think AI could (and by some students probably already is) be used to help a student better understand the material, and faster than you could before

I think so too. But we haven’t demonstrated we’ve found how, in kids or in adults.

> biggest issue is

We genuinely don’t know what the biggest issue is. We just know it doesn’t work. There is zero quality evidence for AI helping with learning or cognition in kids or adults. (Happy to be proven wrong. This is a fast-moving and big field.)

> they'll find a way past any law blocking them from using AI. Like a lot of education it'll probably come down to parenting

And community. Rich towns restrict devices in school, monitor use at home and thus will have less of an issue with AI exposure.

>I think so too. But we haven’t demonstrated we’ve found how, in kids or in adults.

Ask chatgpt or claude, on their highest model (probably unnecessary but I'm sensing a vibe) to explain a simple linear algebra problem, and if you don't understand it, ask about what part you don't understand.

And if you truly believe it made something up, prove it.

This is seriously the easiest thing to prove out there, you can see for yourself in the next 5 minutes.

A crucial part of learning is struggling with understanding and overcoming problems by yourself. AI removes that part.
>AI users who maintain similar homework completion time as non-AI users experience small learning losses.

Seems like there's no benefit even if it's used "correctly"?

Care to give us the bits you found interesting in the paper to spare me plonking down £6?

Would hate to dissect this just off a paragraph.

Considering that the paper concludes that even students who take the long approach and use LLMs in the most appropriate way for learning still retain less over the long term than students who simply don't use LLMs, I think it's likely they didn't read the paper in the first place.
I think AI should be used in higher level schools but with the added requirement that the output will be held to a much higher standard and that it's fact checked. Teach the students to use AI to reach a higher level while mitigating the inherent issues like hallucination and sycophancy.
fwiw, Alpha School is the supervised version. the New York campus is $65k/yr and not legally a school.

private school money with homeschool paperwork and an app doing the teaching.

https://www.wired.com/story/alpha-schools-new-york-city-camp...

We thought the same of electronic devices in general and digital learning content specifically. In actual practice both result in lowered test scores and declining critical thinking skills.
Idk why you screeching AI touts are so confident about its ‘wild’ success in all areas given absolutely zero evidence to that effect.

It’s tiresome.

It's inevitably your fault for prompting incorrectly or using the wrong model.
"You just have to repeat the prompt 3 times and then spin around counter-clockwise twice! That always works for me. You obviously just don't know how to prompt the model correctly."

Every time I see LLM enjoyers yapping on like this, it just reminds me of people trying to read tea leaves. There's all these goofy little rules about how to structure the prompt and how mean or nice to be to get it to work optimally, but I think it's obvious that most of these users are just seeing incidental successful outcomes in a largely random system and extrapolating from there because it makes them feel in control.

It is, quite literally, superstition.

Instead of prompts, let’s call them incantations.