Exactly: what if my radiologist doesn’t know my particular rare condition?
I would feel comfortable removing the human diagnostician. Let’s have actual human doctors acting as researchers working to improve the AI diagnostics.
My perspective as both a radiologist and CS/AI researcher so exactly what you're proposing:
1. We don't practice in imaginary vacuums, it's easy* to identify that something looks abnormal and then refer to clinical resources/other physicians for rare diseases with specific questions in mind (i.e. recognizing the nuanced imaging finding and referring to a resource to assist).
As a rare disease example, tumors of the eyeball/orbit are very rare, but detecting them is not. If I open the case and see one I can refer to StatDx to help me narrow my differential knowing what imaging features I'm looking for. This reduces my misdiagnosis rate (which as an aside is ~2-5% for radiologists, "major" clinically significant that impact morbidity or mortality ~2-10% of those depending on the study).
2. Rare disease are hard to diagnose, and would likely also be hard for an AI. Imaging appearances are not unique to the vast majority of diseases, especially what we cal the "weird and wonderful".
Pelvic tuberculosis, endometriosis, advanced cervical cancer and advanced rectal cancer can look identical/nearly identical on MRI and the clinical portion as well as additional testing helps us get to the diagnosis.
We don't have to diagnose everything based on a single imaging test, nor should we given:
3. Diagnosis is a tradeoff of sensitivity and specificity. You can't have both.
Let's consider adrenal gland tumors. Statistically these are going to be benign, there is no specific imaging feature to tell a small adrenocortical carcinoma (ACC) ~1 in 1 million incidence from an adrenal adenoma (99% of adrenal lesions).
We also can't tell them apart with a biopsy under a microscope.
If you're unlucky enough to get an ACC you're basically shit out of luck as the only options we have are to recommend adrenal surgery (and their complications which can be death) to optimize sensitivity, or assume that it's benign and optimize for specificity considering disease prevalence and risk of overdiagnosis.
In practice, we just use a cutoff of 4cm. I'm not sure how an AI would solve this, especially as there isn't a large enough training set. MD Anderson has the most experience of any center and they've had ~600 cases in 40+ years which as you can imagine encompasses a very heterogeneous imaging set (we didn't have multidetector CT or 3T abdominal MRI 20+ years ago).
Overall, AI can and should help radiologists and as someone involved in this field I can't envision a world where we can safely remove the human diagnostician element from the mix, given that it's a spectrum of grey not black/white labelling as it is for object detection.
We've had attempts with mammography and stroke AI and it's still horrendously inaccurate compared to what I expect out of a resident radiologist let alone an experience staff physician.
A great post. I can attest the same about information extraction from semi-structured documents. The situation is far from full autonomy. Can't do anything without human in the loop, not even with the latest AI.
I am seeing this trend - everyone can explain at length why AI is "not quite up there" in their own field, but believes it's "near AGI" in other fields. We find it hard to imagine future difficulties AI will have to face in general, we can only do that in our own field where we have learned from direct experience.
Exactly this! I chose radiology given my background thinking I could "easily build radiology AI systems" and help our struggling system.
Then I became a radiologist and quickly discovered how hard this is. Something "as simple" as NER and entity-linking on radiology reports is damn near impossible at the moment (even with SOTA LLMs which have made it easier but still not accurate enough for production use).
NER, Entity linking, and relationship extraction definitely seem to be 'low hanging fruit' due to LLM improvements, but one of the big problems is that they really need a completely different architecture to limit the decoder vocab if using a decoder transformer for producing the set of sequences in relation extraction with specific entity ids.
A Longformer with full attention to input sequence, and sliding window attention to a large dictionary could be a decent way to find tune a system like this, but there are few that try it.
Unfortunately there's a lot of stupidity going around right now in thinking the answer is just to 'pRoOoMpT tHe LLm RiGhT', but that will always be exceedingly wasteful such that processing terabytes of files will be prohibitively expensive, and there's no guarantee the system will always restrict to the specific vocab and structure desired.
The images in radiology definitely make these types of things harder, and the sparsity is an enormous issue. However, working with some projects in this area, I don't think it's as impenetrable as a lot of radiologists in AI suggest. The main thing needed in the field is adoption of better techniques and architectures to deal with these problems.
I agree it's not impenetrable, that's why I'm working on this problem. What I disagree with is the "this is trivial" statements.
> Unfortunately there's a lot of stupidity going around right now in thinking the answer is just to 'pRoOoMpT tHe LLm RiGhT'
I agree with that this is not the right approach despite all the media hype, my research has been (more or less) attempting what you've proposed.
> A Longformer with full attention to input sequence, and sliding window attention to a large dictionary could be a decent way to find tune a system like this, but there are few that try it.
Good idea, although I'm biased as we tried this ourselves! Problem is the dictionary (ontology) doesn't exist. RadLex and UMLS are far too inadequate in coverage. Actively working to address the gaps and hope to have something to open-source within the next couple of months.
I previously worked for a Radiology PACS and it's hard to get funding/interest to even tackle the problem. With how lucrative it could be, I would think that a corporation would be very interested in putting resources into it, but this has not been my experience.
No PACS that I know of even wants to tackle digital pathology in a significant way, which last I heard had about 5% adoption versus glass slides.
In other words, we extract a single feature, and apply a single nonlinear activation function to that feature to decide whether or not to activate the 'treat' signal. We've replaced all that vaunted human judgement and mental modeling of the body with a heuristic that has equivalent power to a single neuron neural network.
I appreciate that there's a dearth of training data, and it varies in quality. But this is precisely the kind of thing where sufficiently powerful ML could do better than the simple heuristics we can come up with.
The heterogeneity of imaging types doesn't have to be a problem. Train the model on all the data, and all the different kinds of scan, all the anatomical knowledge.
Look at how LLMs are able to do stuff like write code that has comments written in pirate speak. Do you think they learned how to do that by studying a large body of code with pirate-speak comments in? No. They picked up examples of pirate speak in one context, and code in another context, and they're able to combine them together in ways that make sense.
ML models looking at diagnoses with small training sets that are largely in obsolete scan formats could still, in theory, learn how to spot those diagnoses in more modern scan images, because they have learnt from other, much larger datasets how things in the newer format correlate with features in the old form.
You're missing the point, it takes me < 5 seconds to clear the adrenals. The example is intended to illustrate that there is no feature to extract that would make a model BETTER than a human for the things that we care about (rare and challenging diagnoses).
No one is arguing that ML can't segment and measure a structure, this is the lowest hanging fruit. ML can't diagnose an adrenocortical carcinoma (an example of a rare disease) because medicine doesn't know how to.
> In other words, we extract a single feature, and apply a single nonlinear activation function to that feature to decide whether or not to activate the 'treat' signal.
Now do this for the > 1000 other possible diagnoses on a CT abdomen, and have it be as fast as a human with equal or better ROC curve in under 5 seconds and cheaper than the $70 a radiologist bills for this. Unless you can eliminate having someone like me read this scan a ML model to measure the adrenal glands is worth $0.
I'm aware of the literature in this space. Your proposal is not novel and has been attempted. As soon as you try doing this on more than a handful of (typically easy) diagnoses it stops working. Currently the only useful models flag normal/abnormal to triage interpretation priority.
> Look at how LLMs are able to do stuff like write code that has comments written in pirate speak.
This anecdote doesn't prove anything but we can instead look at OpenAI's own white paper for their more rigorous data on hallucination and accuracy. LLMs aren't ready for a production CRUD app let alone human life.
> ML models looking at diagnoses with small training sets that are largely in obsolete scan formats could stil
It's not obsolete. It's a completely different image type. This is akin to saying a ML model trained on black and white sketches can paint the Mona Lisa in color.
> all the anatomical knowledge.
A misunderstanding of the problem. The anatomy is easy. The pathology is updated every 1-5 years so there is no historical dataset.
1. We don't practice in imaginary vacuums, it's easy* to identify that something looks abnormal and then refer to clinical resources/other physicians for rare diseases with specific questions in mind (i.e. recognizing the nuanced imaging finding and referring to a resource to assist).
As a rare disease example, tumors of the eyeball/orbit are very rare, but detecting them is not. If I open the case and see one I can refer to StatDx to help me narrow my differential knowing what imaging features I'm looking for. This reduces my misdiagnosis rate (which as an aside is ~2-5% for radiologists, "major" clinically significant that impact morbidity or mortality ~2-10% of those depending on the study).
2. Rare disease are hard to diagnose, and would likely also be hard for an AI. Imaging appearances are not unique to the vast majority of diseases, especially what we cal the "weird and wonderful".
Pelvic tuberculosis, endometriosis, advanced cervical cancer and advanced rectal cancer can look identical/nearly identical on MRI and the clinical portion as well as additional testing helps us get to the diagnosis.
We don't have to diagnose everything based on a single imaging test, nor should we given:
3. Diagnosis is a tradeoff of sensitivity and specificity. You can't have both.
Let's consider adrenal gland tumors. Statistically these are going to be benign, there is no specific imaging feature to tell a small adrenocortical carcinoma (ACC) ~1 in 1 million incidence from an adrenal adenoma (99% of adrenal lesions).
We also can't tell them apart with a biopsy under a microscope.
If you're unlucky enough to get an ACC you're basically shit out of luck as the only options we have are to recommend adrenal surgery (and their complications which can be death) to optimize sensitivity, or assume that it's benign and optimize for specificity considering disease prevalence and risk of overdiagnosis.
In practice, we just use a cutoff of 4cm. I'm not sure how an AI would solve this, especially as there isn't a large enough training set. MD Anderson has the most experience of any center and they've had ~600 cases in 40+ years which as you can imagine encompasses a very heterogeneous imaging set (we didn't have multidetector CT or 3T abdominal MRI 20+ years ago).
Overall, AI can and should help radiologists and as someone involved in this field I can't envision a world where we can safely remove the human diagnostician element from the mix, given that it's a spectrum of grey not black/white labelling as it is for object detection.
We've had attempts with mammography and stroke AI and it's still horrendously inaccurate compared to what I expect out of a resident radiologist let alone an experience staff physician.