|
|
|
|
|
by nopinsight
1265 days ago
|
|
Related: "Large Language Models Encode Clinical Knowledge"
https://arxiv.org/abs/2212.13138 "On the MedQA dataset consisting of USMLE style questions with 4 options, our Flan-PaLM 540B model achieved a multiple-choice question (MCQ) accuracy of 67.6%..." "The percentages of correctly answered items required to pass varies by Step and from form to form within each Step. However, examinees typically must answer approximately 60 percent of items correctly to achieve a passing score."
-- https://www.usmle.org/bulletin-information/scoring-and-score... . It seems like the models in the paper could pass USMLE already. Some tests suggest that Med-PaLM is close to human clinicians in many aspects, incl reasoning (Figures 6-7). Other tests show that Med-PaLM still returns inappropriate/incorrect results much more often than clinicians do, however (Figure 8). |
|