|
|
|
|
|
by narrator
751 days ago
|
|
Can't you run the model in reverse? Brute force through various random parameters to the model to figure out which ones make a difference? Sure, it could have absurd dimensionality, but then it would be unlikely one could even grasp how to begin. After all, AlphaGo couldn't write a book for Humans about how to play go as well as it can. |
|
Sadly, insight is always lost. In a noisy world where even with the best regularization, some fitting on it, or higher order features that describe it, is inevitable for maximizing prediction accuracy, especially if you don't have the right tools to model it (like transformers adapting to lacking registers [1]) and yet a lot of parameters within chosen architecture.
What's worse, bad expectations are often much worse than none. If your loan had been denied by a fully opaque black box, you may be offered recourse to get an actual human on the case. If they've trained an interpretable student [2], either by intentional manipulation or by pure luck, it may have obscured the effect of some meta-feature likely corresponding to something like race, thus whitewashing the stochastically racist black box. [3]
[0] "Interpretability in ML: A Broad Overview" https://www.lesswrong.com/posts/57fTWCpsAyjeAimTp/interpreta... [1] "Thread: Circuits" https://distill.pub/2020/circuits/ [2] "Why Should I Trust You?": Explaining the Predictions of Any Classifier" https://arxiv.org/abs/1602.04938 [3] "Fairwashing: the risk of rationalization" https://proceedings.mlr.press/v97/aivodji19a