|
|
|
|
|
by hestefisk
796 days ago
|
|
“What this work shows is that the knowledge and reasoning ability of these large language models in an eye health context is now almost indistinguishable from experts,” said Arun Thirunavukarasu, the lead author of a paper on the findings published in PLOS Digital Health journal.” FTFA. |
|
From the good folks at AI Snake Oil[1]
> Memorization is a spectrum. Even if a language model hasn’t seen an exact problem on a training set, it has inevitably seen examples that are pretty close, simply because of the size of the training corpus. That means it can get away with a much shallower level of reasoning....In some real-world tasks, shallow reasoning may be sufficient, but not always. The world is constantly changing, so if a bot is asked to analyze the legal consequences of a new technology or a new judicial decision, it doesn’t have much to draw upon. In short, as Emily Bender points out, tests designed for humans lack construct validity when applied to bots.
> On top of this, professional exams, especially the bar exam, notoriously overemphasize subject-matter knowledge and underemphasize real-world skills, which are far harder to measure in a standardized, computer-administered way. In other words, not only do these exams emphasize the wrong thing, they overemphasize precisely the thing that language models are good at.
Also[2]:
> Undoubtedly, AI and LLMs will transform every facet of what we do, from research and writing to graphic design and medical diagnosis. However, its current success in passing standardized test after standardized test is an indictment of what and how we train our doctors, our lawyers, and our students in general. ChatGPT passed an examination that rewards memorizing the components of a system rather than analyzing how it works, how it fails, how it was created, how it is maintained. Its success demonstrates some of the shortcomings in how we train and evaluate medical students. Critical thinking requires appreciation that ground truths in medicine continually shift, and more importantly, an understanding how and why they shift. Perhaps the most important lesson from the success of LLMs in passing examinations such as the USMLE is that now is the time to rethink how we train and evaluate our students.
[1] https://www.aisnakeoil.com/p/gpt-4-and-professional-benchmar...
[2] https://journals.plos.org/digitalhealth/article?id=10.1371/j...