|
|
|
|
|
by anewhnaccount2
3384 days ago
|
|
It also explains what an explanation is in this context (which is what was asked): a local linear approximation of the model. Additionally it has a diagram which is nice. Obviously it's not the one being discussed here though -- I'd hardly be adding useful information if I just linked to the submission again as a reply. |
|
In the paper we do define an explanation as basically any artifact that "provides reliable information about the model’s implicit decision rules for a given prediction." It's kind of a rough and over-general definition, but it gets to the idea that explanations can be partial. All we want to do is turn a completely black-box model into something slightly more transparent.
Ideally, we could have explanations that were at a higher level of abstraction, e.g. "this image is a picture of a husky and not a wolf because of the shape of the nose and the color of the coat," but a neural network has no idea what "nose" and "coat" means. Sometimes its intermediate layers will end up corresponding to meaningful abstract concepts like that, but not always.