Hacker News new | ask | show | jobs
by n-e-w 915 days ago
I try not to immediately call BS on these types of studies…but in this case there are some concerns.

“The data sets were randomly divided into training (85%) and test (15%) sets. We used 10-fold cross-validation to obtain generalized results of model performance. Data splitting was performed at the participant level and stratified based on the outcome variables. Because the data classes were imbalanced for symptom severity (ADOS-2 and SRS-2), we performed a random undersampling of the data at the participant level before conducting data splitting. Moreover, we examined different split ratios (80:20 and 90:10) to assess the robustness and consistency of the predictive performances across diverse splitting proportions.”

* undersampling is problematic here and probably introduced some bias. These imbalanced class problems are just plain hard. Claiming one hundred percent on an imbalanced class problem should probably cause some concern. * data split at the participant level has to be done really careful or you’ll over fit * multiple comparisons bias by testing multiple split ratios on the same test data. Same with the 10-fold cross Val. * not sure if they validated results on any external test data * outcome variable stratification also has to be done really carefully or it will introduce bias; seems particularly sensitive in this case * using severity of symptoms as class labels is problematic. These have to really have been diagnosed the same way / consistently to be meaningful.

I also note a long time history in collection of these images (15 years iirc). Hard to believe such a diverse set of images (collection, equipment etc) led to perfect results.

ML issues aside, super interested in the basic medical concept. I wasn’t aware retinal abnormalities could be indicative of issues like ASD.

2 comments

Another potential issue:

> The photography sessions for patients with ASD took place in a space dedicated to their needs, distinct from a general ophthalmology examination room. This space was designed to be warm and welcoming, thus creating a familiar environment for patients. Retinal photographs of typically developing (TD) individuals were obtained in a general ophthalmology examination room. Each eye required an average of 10–30 s for photography, although some cases involved longer periods to help the patient calm down, sometimes exceeding 5–10 min. All images were captured in a dark room to optimize their quality. Retinal photographs of both patients with ASD and TD were obtained using non-mydriatic fundus cameras, including EIDON (iCare), Nonmyd 7 (Kowa), TRC-NW8 (Topcon), and Visucam NM/FA (Carl Zeiss Meditec).

So two questions:

1. Are we positive that the difference in rooms does not effect these images?

2. If we are in a dark room, and ASD patients are in it for 5-10 minutes longer, are we sure this doesn't effect the retina?

3. Were all cameras used for both ASD and TD images?

Want to make sure the AI is being trained to detect autism, and wasn't accidentally trained to identify camera models, length-in-dark-room or room-welcomingness.

Hopefully not, but I assume you have to be so careful with these sort of things when the model is entirely black-box and you can't actually validate what it's actually doing inside.

This is definitely worthy of concern. There's an infamous case where an AI was trained to detect cancer from imaging but all the positive examples included a ruler (to measure the tumor) so it turned out it just was good at detecting rulers. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9674813/#:~:tex....
If they consistently captured the images in different settings, then I guarantee you that that’s what the AI learned.

Just being in a dark room longer is sufficient to make changes that an AI could pick up on.

Darn, was excited for a minute. This sort of experiment needs double blinding.

Ideally, they should capture the images from children before diagnosis, then see if they can predict the diagnosis.

Reminds me of the classic apocryphal early ML story of the enemy tank detector that was 100% accurate at identifying camouflaged tanks… so long as tanks and sunny weather were perfectly correlated in the input data, just as they were in the training data.
It appears they also report good results for predicting symptom severity. It's less obvious how the cameras etc would leak into severity. Unless it actually works (it does seem a bit too good to be true), I'm thinking the test set was in the base model or something
Unsure, but there are lots of variables there and there could be even more we don't know about not mentioned in the paragraph! Maybe more severe cases involved longer periods to help the patient calm down in the dark environment? I dunno! Just something smells fishy. You are right, could have also been training data leaking, just looks like there are multiple leaky elements here potentially!

Also, the study checked ASD participants were autistic by using structured interviews with psychologists against the DSM-5, but the TD participants were never assessed by psychologists, so if autism under-diagnosis is a thing, there could theoretically be false-negatives.

You are in a desert, you see a turtle on it's back. What do you do?
if we can diagnose autism by measuring how long it takes to take a picture isn't that even better?
Came here to say this. 100% is too good to be true and it's almost certainly the AI has figured out a signal leak from the camera, image format, room, etc.
Yes! I would also be surprised if the ground truth didn't have some errors in it.

If a model was 100% accurate, considering the nature/accuracy of manually diagnosing autism you would probably expect the AI to either find new cases or identify a few incorrect diagnosises.

Also concussions according to the article, which is news to this retired former neurosurgical anesthesiologist.(38 years in practice; stopped 2015 at age 67 because I believed [still do]) it's better to retire [from my profession, at least] too early than too late.