| HN Mirror

Yeah, so local linear approximations are what we and LIME are using as explanations, but it's not what an explanation is generally.

In the paper we do define an explanation as basically any artifact that "provides reliable information about the model’s implicit decision rules for a given prediction." It's kind of a rough and over-general definition, but it gets to the idea that explanations can be partial. All we want to do is turn a completely black-box model into something slightly more transparent.

Ideally, we could have explanations that were at a higher level of abstraction, e.g. "this image is a picture of a husky and not a wolf because of the shape of the nose and the color of the coat," but a neural network has no idea what "nose" and "coat" means. Sometimes its intermediate layers will end up corresponding to meaningful abstract concepts like that, but not always.