| > In practice, we just use a cutoff of 4cm. In other words, we extract a single feature, and apply a single nonlinear activation function to that feature to decide whether or not to activate the 'treat' signal. We've replaced all that vaunted human judgement and mental modeling of the body with a heuristic that has equivalent power to a single neuron neural network. I appreciate that there's a dearth of training data, and it varies in quality. But this is precisely the kind of thing where sufficiently powerful ML could do better than the simple heuristics we can come up with. The heterogeneity of imaging types doesn't have to be a problem. Train the model on all the data, and all the different kinds of scan, all the anatomical knowledge. Look at how LLMs are able to do stuff like write code that has comments written in pirate speak. Do you think they learned how to do that by studying a large body of code with pirate-speak comments in? No. They picked up examples of pirate speak in one context, and code in another context, and they're able to combine them together in ways that make sense. ML models looking at diagnoses with small training sets that are largely in obsolete scan formats could still, in theory, learn how to spot those diagnoses in more modern scan images, because they have learnt from other, much larger datasets how things in the newer format correlate with features in the old form. |
No one is arguing that ML can't segment and measure a structure, this is the lowest hanging fruit. ML can't diagnose an adrenocortical carcinoma (an example of a rare disease) because medicine doesn't know how to.
> In other words, we extract a single feature, and apply a single nonlinear activation function to that feature to decide whether or not to activate the 'treat' signal.
Now do this for the > 1000 other possible diagnoses on a CT abdomen, and have it be as fast as a human with equal or better ROC curve in under 5 seconds and cheaper than the $70 a radiologist bills for this. Unless you can eliminate having someone like me read this scan a ML model to measure the adrenal glands is worth $0.
I'm aware of the literature in this space. Your proposal is not novel and has been attempted. As soon as you try doing this on more than a handful of (typically easy) diagnoses it stops working. Currently the only useful models flag normal/abnormal to triage interpretation priority.
> Look at how LLMs are able to do stuff like write code that has comments written in pirate speak.
This anecdote doesn't prove anything but we can instead look at OpenAI's own white paper for their more rigorous data on hallucination and accuracy. LLMs aren't ready for a production CRUD app let alone human life.
> ML models looking at diagnoses with small training sets that are largely in obsolete scan formats could stil
It's not obsolete. It's a completely different image type. This is akin to saying a ML model trained on black and white sketches can paint the Mona Lisa in color.
> all the anatomical knowledge.
A misunderstanding of the problem. The anatomy is easy. The pathology is updated every 1-5 years so there is no historical dataset.