| I do research in computer vision and this paper is so bad it's beyond words. * They give the network is huge advantage: they teach it that it should say "no" 80% of the time. The training data is unbalanced (80% no vs 20% yes) as is the test data. Of course it does well! I don't care what they do at training time, but the test data should be balanced or they should correct for this in the analysis. * They measure the wrong things that reward the network. Because the dataset is imbalanced you can't use an ROC curve, sensitivity, or specificity. You need to use precision and recall and make a PR curve. This is machine learning and stats 101. * They measure the wrong thing about humans. What a doctor does is they decide how confident they are and then they refer you to a biopsy. They don't eyeball it and go "looks fine" or "it's bad". They should measure how often this leads to a referral, and they'll see totally different results. There's a long history in papers like this of defining a bad task and then saying that humans can't do it. * They have a biased sample of doctors that is highly skewed toward people with no experience. Look at figure 1. A lot of those doctors have about as much experience to detect melanoma as you do. They just don't do this task. * "Electronic questionnaire"s are a junk way of gathering data for this task. Doctors are busy. What tells the authors that they're going to be as careful for this task as with a real patient? Real patients also have histories, etc. I could go on. The number of problems with this paper is just interminable (54% of their images were non-cancer because a bunch of people looked at them. If people are so wrong, why are they trusting these images? I would only trust biopsies). This isn't coming to a doctor's office anywhere near you. It's just a publicity stunt by clueless people. Please collaborate with some ML folks before publishing work like this! There are so many of us! |
If possible, you should write a critical response to this paper, focusing on its methodological flaws, and send it to the editors. It doesn't have to be long; critical response are usually a couple pages at most. This is likely the most effective way of removing (or at the very least, heavily qualifying) bad science from research journals.