Hacker News new | ask | show | jobs
by uoaei 2243 days ago
I think there ought to be a distinction between explainable (what does this neuron activate most strongly on?) models and interpretable (what do the model's parameters tell me about the data?) models.

The distinction is this: explanations can only be made ex post facto, about why the model acted a certain way based on specific inputs; interpretations can be made based on the model's parameters themselves, i.e., "feature X is very important and feature Y is almost always ignored and I know this because my NN is one layer deep and all the weights for feature X are large in magnitude and all the weights for feature Y are small in magnitude." This does not require specific inputs to be fed, and specific outputs to be studied, so is a different concept and why I am suggesting we make the distinction explicit.

1 comments

Zachary Lipton has a really good taxonomy of the different things people refer to when they talk about interpretability and explainability here:

https://arxiv.org/pdf/1606.03490.pdf