| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by solid_fuel 2 hours ago
	Disagree. AI has no business being used in 1:1 tutor mode before the hallucination and sycophancy issues are completely resolved. As is, I can easily see it being a hindrance to building actual understanding. Just one example - it's very common to see ChatGPT and the like respond with "you're absolutely correct! Great insight" to something that is a complete misunderstanding.

3 comments

ndriscoll 1 hour ago

This is specifically a consumer model (or specifically ChatGPT) issue. e.g. IME codex does not do this, and will just tell you when you're missing something or somehow wrong, and Gemini does this weird thing where it tells you you're a genius and then immediately starts correcting everything you said.

link

solid_fuel 56 minutes ago

Sycophancy is just one aspect of the problems I mentioned, though. Another huge one is hallucination, and one that is actually far worse than I thought:

> It’s been proven that when a model is trained on large volumes of highly factual and non-theoretical data, it learns to always have an answer. DeepSeek V4 Pro (1.6T params, 49B active, 44 AA Intelligence Index score) has a ludicrous 94% hallucination score on the AA-Omniscience benchmark, meaning on questions that it couldn’t figure out, it only stated that it didn’t know around 6% of the time, and the rest it confidently hallucinated an answer. GLM-5.2 scored a 28% hallucination rate, Opus 4.8 was 36%, Fable 5 was 48%, and GPT-5.5 was 86%.

https://arrowtsx.dev/bigger-models/

I think even a 5% hallucination rate would be terrible for a teacher, who should generally be comfortable with saying "I don't know off the top of my head but here is how to find resources to answer your question".

---

So, just to drive the point home, Codex has an 86.9% hallucination rate on the AA-omniscience score in this index https://benchlm.ai/models/gpt-5-3-codex - if you ask it something that wasn't sufficiently covered in its training data, it will confidently make up an answer nearly 87% of the time.

While you might think it is happy to correct you when you are wrong, you don't know that for sure since you don't know when you're wrong. Codex may have been happily agreeing with you about things you had completely backwards.

link

ndriscoll 9 minutes ago

Except I generally do know when I'm wrong because I'm working in a domain I am familiar with, and it will often create experiments on the fly unprompted (well, prompted, but generically in AGENTS.MD) to check itself. My experience actually using it for software is that it almost never makes up answers.

link

JumpCrisscross 2 hours ago

Just realized 1:1 AI is 90s self-esteem medals-for-everyone parenting on steroids.

link

therealdrag0 2 hours ago

Teachers hallucinate too. I’ve had creationists and communists and tin-foil-hat (chem trails, 5g, etc) teachers. Surely you can imagine an AI tutor that is higher than zero ROI.

link

solid_fuel 2 hours ago

> I’ve had creationists and communists and tin-foil-hat (chem trails, 5g, etc) teachers.

I certainly have, too, but there is still a difference between a person who has a factually incorrect but consistent worldview and an LLM which simply reflects the worldview of the user or even changes between queries.

I don't think creationists have any business being in schools either, for what it's worth, but I think it's easier for a teenager to sort out "Mr. Smith has no clue what he's talking about" vs "I have no clue what's true because the LLM everyone expects me to learn from just confirms everything I ask regardless of what I'm asking".

link

beejiu 2 hours ago

A bit part of education is (should be) independent learning with textbooks and reading. You don't need to be "tutored".

link

geraneum 2 hours ago

That’s rather disingenuous. But it seems nowadays that words have lost meaning… so, I don’t blame you. I blame the LLMs for this deterioration.

link

seiie 7 minutes ago

lol scraping the bottom of the barrel

link