Hacker News new | ask | show | jobs
by nopinsight 1265 days ago
Related: "Large Language Models Encode Clinical Knowledge" https://arxiv.org/abs/2212.13138

"On the MedQA dataset consisting of USMLE style questions with 4 options, our Flan-PaLM 540B model achieved a multiple-choice question (MCQ) accuracy of 67.6%..."

"The percentages of correctly answered items required to pass varies by Step and from form to form within each Step. However, examinees typically must answer approximately 60 percent of items correctly to achieve a passing score." -- https://www.usmle.org/bulletin-information/scoring-and-score...

.

It seems like the models in the paper could pass USMLE already.

Some tests suggest that Med-PaLM is close to human clinicians in many aspects, incl reasoning (Figures 6-7). Other tests show that Med-PaLM still returns inappropriate/incorrect results much more often than clinicians do, however (Figure 8).

1 comments

I'm kind of surprised the model doesn't score higher as there is clear pattern to questions + answers and there would a huge amount of training data for USMLE. But as stated elsewhere, there is an enormous gap between passing exams and treating real patients as a doctor. It's rarely about making obscure diagnoses found in exam questions, but about managing illness in the context of a patient and their lifestyle, with many very human aspects - difficult communication, ethics & assessing family dynamics. Written exams are just to assess whether a medical student has the minimum required knowledge to practice, but also there are lots of practical exams and communication scenarios required too. It may well be the same for lawyers - passing the bar does not really relate to actual day-to-day practice.