Hacker News new | ask | show | jobs
by hgial 1744 days ago
It might be helpful for folks to look at the blog post written by one of the authors:

https://lukeoakdenrayner.wordpress.com/2021/08/02/ai-has-the...

or the paper itself

https://arxiv.org/pdf/2107.10356.pdf

I see a lot of "oh it's probably just picking up on x y z" when x, y, and z are things they explicitly checked for:

1) "It's probably just the names or other metadata" – they only gave it pixel data to train on. To control for things like metadata overlaid on the image (e.g., a name written on the image) they divided the images into 3x3 sections and trained classifiers on each section separately.

2) "It's probably some artifact of how the hospital marked up the images" – they used something like 7 different datasets from different hospitals and different modalities (X-Ray and CT).

If it is cheating somehow, it's not doing it in an obvious way that you can think of in a minute or two. Also note that they had more than just medical folks working on the paper; the author list includes plenty of computer scientists. It's unlikely they're making an elementary ML mistake here.

1 comments

One major risk source I see is that the size of the training data for the races isn't the same. For white vs. black patient data, there's between a 2:1 and 3:1 ratio bias in both the training and test data (and a much higher ratio bias for Asian... as high as 20:1 in some of these categories).

This gives the CNN more information on one race than another, which can create a classifier that performs very well on the training and test data it has access to but then flakes spectacularly on data outside the training set (because the source isn't representative of the total variance in the global population).

They tested on tons of different external datasets, and at least one of the training datasets was balanced. Same results were obtained.