Hacker News new | ask | show | jobs
by colinmorelli 391 days ago
For what it's worth this statement is actually not entirely correct anymore. Top-end models today are on par with diagnostic capabilities of physicians on average (across many specialties), and, in some cases, can outperform them when RAG'd in with vetted clinical guidelines (like NIH data, UpToDate, etc)

However, they do have particular types of failure modes that they're more prone to, and this is one of them. So they're imperfect.

1 comments

This is ChatGPT's self assessment. Perhaps you mean a specialized agent with RAG + evals however.

ChatGPT is not reliable for medical diagnosis.

While it can summarize symptoms, explain conditions, or clarify test results using public medical knowledge, it: • Is not a doctor and lacks clinical judgment • May miss serious red flags or hallucinate diagnoses • Doesn’t have access to your medical history, labs, or physical exams • Can’t ask follow-up questions like a real doctor would

Sorry, I should have clarified, but no this is not ChatGPT's self assessment.

I am suggesting that today's best in class models (Gemini 2.5 Pro and o3, for example), when given the same context that a physician has access to (labs, prior notes, medication history, diagnosis history, etc), and given an appropriate eval loop, can achieve similar diagnostic accuracy.

I am not suggesting that patients turn to ChatGPT for medical diagnosis, or that these tools are made available to patients to self diagnose, or that physicians can or should be replaced by an LLM.

But there absolutely is a role for an LLM to play in diagnostic workflows to support physicians and care teams.