| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by uoaei 2243 days ago
	I think there ought to be a distinction between explainable (what does this neuron activate most strongly on?) models and interpretable (what do the model's parameters tell me about the data?) models. The distinction is this: explanations can only be made ex post facto, about why the model acted a certain way based on specific inputs; interpretations can be made based on the model's parameters themselves, i.e., "feature X is very important and feature Y is almost always ignored and I know this because my NN is one layer deep and all the weights for feature X are large in magnitude and all the weights for feature Y are small in magnitude." This does not require specific inputs to be fed, and specific outputs to be studied, so is a different concept and why I am suggesting we make the distinction explicit.

1 comments

Zachary Lipton has a really good taxonomy of the different things people refer to when they talk about interpretability and explainability here: