Hacker News new | ask | show | jobs
by anewhnaccount2 3384 days ago
It also explains what an explanation is in this context (which is what was asked): a local linear approximation of the model. Additionally it has a diagram which is nice. Obviously it's not the one being discussed here though -- I'd hardly be adding useful information if I just linked to the submission again as a reply.
1 comments

Yeah, so local linear approximations are what we and LIME are using as explanations, but it's not what an explanation is generally.

In the paper we do define an explanation as basically any artifact that "provides reliable information about the model’s implicit decision rules for a given prediction." It's kind of a rough and over-general definition, but it gets to the idea that explanations can be partial. All we want to do is turn a completely black-box model into something slightly more transparent.

Ideally, we could have explanations that were at a higher level of abstraction, e.g. "this image is a picture of a husky and not a wolf because of the shape of the nose and the color of the coat," but a neural network has no idea what "nose" and "coat" means. Sometimes its intermediate layers will end up corresponding to meaningful abstract concepts like that, but not always.