Hacker News new | ask | show | jobs
by function_seven 3327 days ago
This is why I want any ML device to be able to explain itself. It could train on your before-and-after examples and come up with a list of what it thinks you want it to do.

For your example, it could list:

    “Remove interior spaces from each item”
or it could say:

    “Remove the middle character from any 7-character strings to make them 6 characters in length”
You would be able to do something with that.
9 comments

DataWrangler [0] (now productionized as Trifacta Wrangler [1]) does pretty much that. It gives you suggested lists of transformations such as "Cut from position 18-25 as the Year column", that you can chain together as your data cleaning pipeline.

[0]: http://vis.stanford.edu/wrangler/

[1]: https://www.trifacta.com/products/wrangler/

This add-on already does that. (Did nobody try it??) It shows in the pane a list of candidate transforms, seemingly ranked in some descending plausibility order. They have semi-readable names. You get to choose one to apply.
Neural nets are infamous for not doing this.

Learning algorithms that produce decision trees are usually used in this situation.

This might be a dumb question, but let's say that for whatever reason on a specific problem it's much easier to train a neural network that generalizes well than a decision tree. Why not train the network, then build an equivalent decision tree that just tries to reproduce the network's output? When building the tree from the network, overfitting would not be a concern. In fact, you'd want it to overfit.

You could even say that it only needs to approximately reproduce the output with some tunable error threshold, which might give you leeway for finding more comprehensible and simpler trees.

I think usually if a problem is more solvable by a neural network than a decision tree, there is an underlying reason. Neural nets and decision trees work in very different ways.

Take image classification as an example. CNNs can do it by finding nonlinear patterns that exist. A decision tree would have a very tough time doing it because the pixels have a complicated relationship with each other that defines what the image is of.

I think for something like the data transformations we're talking about a Neural Network would be pretty over kill. It looks like this feature in excel works by comparing the data to pre-defined formats, which is probably done by searching all known formats in a somewhat intelligent (not ai, just intelligent) way so that it's fast. Then it can output that type of data in whatever form you want.

Your comment gave me an interesting idea though: What if we put neural networks inside of decision trees?

> This might be a dumb question, but let's say that for whatever reason on a specific problem it's much easier to train a neural network that generalizes well than a decision tree. Why not train the network, then build an equivalent decision tree that just tries to reproduce the network's output? When building the tree from the network, overfitting would not be a concern. In fact, you'd want it to overfit.

You haven't fixed anything here. You've just encoded your training data in a neural net and then presented the same problem to the decision tree learner. Unless you're planning to transform your training data somehow?

I don't think I follow here. The goal of training the network isn't to encode the training data in the model, but rather to build a model that generalizes well. If the neural network has just memorized the training examples, then it overfit and really isn't useful in the real world.

I'm imagining a hypothetical example where generalization is easier to achieve with a neural network than with a decision tree using standard training techniques. Then a tree trained on the network might generalize better than a tree trained straight on the original data, with the additional benefit of being less of a black box than the network.

This solves the problem of interpretability. You can't interpret the weights of a neural network, but you can easily follow along a decision tree and see if it's doing what you want.

Actually that's somewhat less true for big decision trees. But the general point is that you can train interpretable models to mimic the output of uninterpretable black boxes.

The biggest issue is that decision trees only work for data with fixed inputs and outputs. Recurrent NNs work on a time series and possibly even have attention mechanisms.

No, it doesn't make sense. The training data (inputs and NN-predicted outputs) that you're feeding into the DT is at best the same as the training data (inputs and desired outputs) you had originally.
You can generate infinite training data with the NN by feeding in random inputs and seeing what outputs it gives. You can then train whatever model you want on it without concern for overfitting.

But more importantly, the decision tree will model the behavior of the NN, not necessarily the original data. Which is what you want, if your goal is to understand what function the NN has learned.

> When building the tree from the network, overfitting would not be a concern.

True

> tunable error threshold, which might give you leeway for finding more comprehensible and simpler trees

True

However, my guess is you'll wind up doing only one of (a) having more accurate tree model than training it directory (b) improve the understand ability of your model significantly.

Which is why learning algorithms that produce decision trees are far smarter in the long run. Neural nets might eke out other benefits but there's a lot to be said about justifiable/accountable decisions.
Correct me if I'm wrong, but the only type of decision tree that is comparable to a NN in terms of performance is an ensemble of decision trees, and these are equally hard to interpret as NNs.
Except the big (if not the main) part of modern economics is all about "don't care about long run - that's the only smart strategy".
Except that's not so much a scientific theory but a justification for "quick bucks, everything else be damned".
It's theoretically possible for a neural net to do this; the network just needs to have the explanation as an output. I agree that decision trees would be more reliable and easier to train, but I'm not sure if hardcoding every feature is scalable.
How do you know that the explanation jives with the other outputs, though? It seems like a turtles-all-the-way-down situation, because now I want to see how it was properly introspective of its own decision making.

Also seems like it’s another magnitude of complexity in the neural net to have it not only train and learn on your inputs, but also train and learn on its own training and learning.

Neural networks don't do anything as sophisticated as self-referential introspection. They just fit the outputs you train them with. The training data you provide would have to include the desired explanations.

Consistency is enforced by the dataset, and also by the model. Both outputs would read from the same hidden layer--the one that encodes the desired transformation.

>How do you know that the explanation jives with the other outputs, though?

The third neural net would do the checking, obviously.

> the network just needs to have the explanation as an output

And how would you evaluate whether the explanation was correct or not?

You give it explanations as training data and it tries to predict them.
> This is why I want any ML device to be able to explain itself.

This is the problem of lacking explanatory mechanisms in ML.

Note that some techniques that are very out of vogue at the moment, such as Genetic Programming, are much better than neural nets in this regard.

IIRC you _can_ get this, but it's a huge algorithm that doesn't do things in a way that would probably make sense to a human. It would be amazing to be able to transform code into human language.
I was sloppy in my own examples. I’d be perfectly satisfied with an AST, or regex, or other non-English explanation. But something that I can audit is what I’m after. Otherwise this tool would never be trustworthy enough to let lose on billions of rows of data, with silent errors occurring throughout. (Well, I guess it depends on the nature and importance of the data. Cat photos, meh. Drug prescriptions? Ahh!)
Maybe also synthesize and suggest property based tests, by one specify also some invalid examples. Then these checks could be ran for each transformation. For instance:

- 123 456 = 123456 valid

- 1234567 = 123567 not valid (dropped 4)

Properties:

- output may not contain whitespace

- no number characters may be dropped

- characters may not be reordered

Some ML systems (like decision trees) can give you a comprehenable way that they made the decision (would give you a list of if conditionals). Unfortunately many can't do this (random forests) . Having an AI that can explain itself in every situation why it does it also has to do with the underlying techniques. For instance random forests generate a random sample of the features and creates decision trees. So the explanation wouldn't be much useful to you.
Not true. Look up "partial dependence plots".
This is the stated goal of the Explainable AI initiative, (spearheaded afaik by DARPA, though Google tells me corporates have also began work on it). I hope it works out well because there's going to be a lot of AI code in the near future, and the thought of them all being inscrutable black boxes is pretty scary.
But, you know, if you saw something like, all your visible examples were like the strings

  123 456
  234567
  345 678
and the program replies with something like what you wrote: “Remove the middle character from any 7-character strings to make them 6 characters in length”, it would actually take a programmer’s mind to be able to envision why this might in some cases be wrong. Most people who are not programmers would, I think, see this as equivalent to “Remove interior spaces from each item”. I suspect that the skill required to choose an algorithm correctly is the exact same skill required to actually being a programmer.

All this then buys you is that you don’t have to remember the function names.

Yes, this system should do have an intermediate step of spiting out a checklist of clear rules and the user can select the best fit, saving the human the time it would take to search from a bloated dropdown of all possible rules.