| Two cents as an MD-(CS)PhD student studying what I've heard referred to as "the last mile problem." My stack trace of investigation: - The model is good, we just need to get the doctors to trust the model. - The model is good, we need to figure out how to build an informed trust in the model (so as to avoid automation bias). - The model is good, we need informed trust, but we can't tackle the trust issue without first figuring out a realistic deployment scenario. - The model is good, we need informed trust, we need a realistic deployment scenario, but there are some infrastructural issues that make deployment incredibly difficult. After painstaking work with real-life EHR system, sanity-check model inference against realistic deployment scenario. - Holy crap, the model is bad and not at all suitable for deployment. 0.95 AUC, subject of a previous publication, and fails on really obvious cases. My summary so far of "why?": assumptions going into model training are wildly out of sync with the assumptions of deployment. It's "Hidden Tech Debt in ML" [1] on steroids. [1] https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f... |
Underspecification Presents Challenges for Credibility in Modern Machine Learning, D'Amour et al., https://arxiv.org/abs/2011.03395