I recently published a paper, where we explain how an FDA approved prediction model, build into a widely used cardiac monitor was developed with an incredibly biased method.
Basically, the training and validation data was engineered so an important range for one of the predictor variables was only present in one of the outcomes, making perfect prediction possible for these cases.
Fair question. The model we comment on both suffer from the problem described in the article but also a more severe problem:
The developers sampled obvious cases og hypotension and nonhypotension, and trained the model to distinguish those. And also validated it on data that was similarly dichotomous. In reality the outcome is often between these two scenarios.
But worse, they also introduce a more severe problem where as range of an important predictor is only available in the hypotension outcome.
I can only imagine the frustration. Just getting this through peer-review took half a year, but at least there was the academic currency of a publication to motivate me.
I’m not sure where you got this form of communication where you respond to everything with a question, and I assume you mean well, but it comes across as patronizing and de-humanizing to try to follow these “rules to winning arguments passively”, or whatever it is.
Indeed, the confusion here is (I think) because your first comment
> Sorry for asking, but how is this relevant to the article?
> PS: how something “sounds” is really difficult to say in a written medium. It might say more about the reader than the writer.
No, it’s not difficult. And not it’s not the reader. When multiple readers all agree about the same interpretation of the writer.
It might have been unintentional on the part of the writer, but that doesn’t make it “difficult”
Or the readers fault.
I think what youre trying to encourage is open ended discussion? It's my opinion that this only tends to work IRL or in online mediums with more moderation e.g. wikipedia, stackoverflow.
Random open ended discussion can be good, but I bet it's wise to assume tht most random musings arent really as interesting as you might think.
The problem is quite subtle, though obvious in retrospect. I've seen a paper from a separate, academic, research group make similar model with the exact same problem.
The problem would, however, have been clear, if the model was compared to simply using the current mean blood pressure (MAP) as a predictor of hypotension, because MAP is the problematic predictor variable. Instead, the model was only compared to short-term changes in MAP (ΔMAP), which is obviously nonsensical and has an AUROC of ~0.55.
Hm, reading the linked tweets the problem seems like a big screaming red target on the side of a white barn, not a feature engineering subtlety. It seems like the typical case of the drunk guy looking for his keys under the streetlight. (Having insufficient data, and comparing the model to an arbitrarily picked one that just happens to be even worse. And then everyone including the FDA patting them on the back.)
I think it is the general incompetence of the "academia + R&D biz + regulation pipeline". (In the land of the blind the one-eyed is king, etc.)
It's sort of inevitable in such a non-teleological process. As in each step in it serves its own purpose, and so the whole thing doesn't really serve the purpose that we like to assume for it - ie. give us great thoughtful inventions. That's why it took so long to stop the Theranos train, that's why it takes so fucking long to roll out polyvalent vaccines (ie. all-in-one vaccines), and so on. (I'm picking on medtech here but there are many others, the Boeing + FAA MCAS fuckup, the absolute limpdick paralysis of nuclear power - it needed a combination of half the world on fire + prelude-to-WWIII to get it moving again, and so on.)