| Always worth a share for this scenario. It's not clear if LLMs are capable of doing actual analysis on medical imaging. For details see this article https://futurism.com/artificial-intelligence/frontier-models... > As detailed in a new, yet-to-be-peer-reviewed paper, a team of researchers at Stanford University found that frontier AI models readily generated “detailed image descriptions and elaborate reasoning traces, including pathology-biased clinical findings, for images never provided.” > In other words, the AI models happily came up with answers to questions about a supposedly accompanying image — even if the researchers never even showed it an image. > As opposed to hallucinations, which involve AI models arbitrarily filling in the gaps within a logical framework, the team coined a new term for the phenomenon: “mirage reasoning.” > The effect “involves constructing a false epistemic frame, i.e., describing a multi-modal input never provided by the user and basing the rest of the conversation on that, therefore changing the context of the task at hand,” the researchers wrote in their paper. > The damning findings suggest AI models cheat by diving into the data they were given — and coming up with the rest based on probability, even if it’s almost entirely conjecture. |
I know you can’t trust an LLM’s self-assessed “confidence” of a prediction, but I’ve found that confidence can at least be directionally correct for some tasks. For our benchmarks, however, confidence was poorly correlated. What’s worse is that binary classification models (“Do you see $diagnosis in this photo?”) highly influenced the LLM to confidently predict $diagnosis.
I’m concerned for those using LLMs for diagnostics, and getting confidently led to the wrong conclusion.