Hacker News new | ask | show | jobs
by AlexCoventry 402 days ago
It depends on the AI. ChatGPT's higher models (o1-pro/o3/o4-mini-high) have some kind of limited capability to detect errors in the user's thinking, and have relatively few hallucinations.
2 comments

o3 have twice the hallucinations of o1 according to their own hallucination benchmark
I've had fun debates about things like p-zombies with Gemini 2.5 Pro