Hacker News new | ask | show | jobs
by kendallpark 1879 days ago
Two cents as an MD-(CS)PhD student studying what I've heard referred to as "the last mile problem."

My stack trace of investigation:

- The model is good, we just need to get the doctors to trust the model.

- The model is good, we need to figure out how to build an informed trust in the model (so as to avoid automation bias).

- The model is good, we need informed trust, but we can't tackle the trust issue without first figuring out a realistic deployment scenario.

- The model is good, we need informed trust, we need a realistic deployment scenario, but there are some infrastructural issues that make deployment incredibly difficult.

After painstaking work with real-life EHR system, sanity-check model inference against realistic deployment scenario.

- Holy crap, the model is bad and not at all suitable for deployment. 0.95 AUC, subject of a previous publication, and fails on really obvious cases.

My summary so far of "why?": assumptions going into model training are wildly out of sync with the assumptions of deployment. It's "Hidden Tech Debt in ML" [1] on steroids.

[1] https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f...

2 comments

You've probably seen it, but a more recent, related paper (that I think has some of the same authors) about inherent features of modern ML that make models so fragile, even if they test OK:

Underspecification Presents Challenges for Credibility in Modern Machine Learning, D'Amour et al., https://arxiv.org/abs/2011.03395

I had not seen it yet, excited to read it! Thanks!
my question: why don't they just specify that this ML model has been trained with this type of medical equipment? Couldn't they make it part of the SLA to use the same type of equipment in the field as that used to obtain the training images?