Hacker News new | ask | show | jobs
by narrator 751 days ago
Can't you run the model in reverse? Brute force through various random parameters to the model to figure out which ones make a difference? Sure, it could have absurd dimensionality, but then it would be unlikely one could even grasp how to begin. After all, AlphaGo couldn't write a book for Humans about how to play go as well as it can.
1 comments

That's what model interpretability research is. You can train an interpretable model from the uninterpretable teacher, you can look at layer activations and how they correspond to certain features, or apply a hundred other domain-specific methods depending on your architecture. [0]

Sadly, insight is always lost. In a noisy world where even with the best regularization, some fitting on it, or higher order features that describe it, is inevitable for maximizing prediction accuracy, especially if you don't have the right tools to model it (like transformers adapting to lacking registers [1]) and yet a lot of parameters within chosen architecture.

What's worse, bad expectations are often much worse than none. If your loan had been denied by a fully opaque black box, you may be offered recourse to get an actual human on the case. If they've trained an interpretable student [2], either by intentional manipulation or by pure luck, it may have obscured the effect of some meta-feature likely corresponding to something like race, thus whitewashing the stochastically racist black box. [3]

[0] "Interpretability in ML: A Broad Overview" https://www.lesswrong.com/posts/57fTWCpsAyjeAimTp/interpreta... [1] "Thread: Circuits" https://distill.pub/2020/circuits/ [2] "Why Should I Trust You?": Explaining the Predictions of Any Classifier" https://arxiv.org/abs/1602.04938 [3] "Fairwashing: the risk of rationalization" https://proceedings.mlr.press/v97/aivodji19a

This reminds me of another thing I use when teaching: a perfect model of the entire world would be just as inscrutable of the world itself.

I think having multiple layers of abstraction can be really useful and have done it myself for some agent-based models with high levels of complexity. In some sense, these approaches can also be thought of as "in-silica experiments".

You have a model that is complex and relatively inscrutable, just like the real world, but unlike the real world, you can run lots of "experiments" quite cheaply!