Hacker News new | ask | show | jobs
by dna_polymerase 3226 days ago
> Given enough examples, computers can understand what is letter "A" and what is letter "B".

Meh.

Given enough examples, computers now can distinguish letter A and B but distinguishing is not understanding. You could argue that after learning the Network just uses an instruction set and from the outside that may leave the impression of understanding but it really does not. Isn't that basically the Chinese room thing?

1 comments

In fact recent research indicates that you can randomly relabel the training examples and the network still achieves zero training error (https://arxiv.org/abs/1611.03530). So it is not "understanding" anything intrinsic or fundamental about the letter "A". Rather, it's just storing training examples somewhere inside of its millions of parameters, which sounds a lot less impressive.
That is not a conclusion that can be drawn from the findings in the paper. While the models they evaluate can achieve zero training error on random labels, the test error is obviously not zero: it doesn't generalize at all. However, training on real labels often finds solutions which can generalize quite well.

A better way to summarize the central question of this paper would be: "Why is it that a large-parameter model trained with gradient descent on real data _could_ just memorize all of the training data (it has the capacity) yet finds solutions which generalize well to an unseen test set?"

To say that deep learning is _just_ memorizing its training data would be incorrect. We have empirical evidence to the contrary and this paper is part of that evidence.

But we also have empirical evidence that they generalize incredible poorly, namely the existence of imperceptible (adversarial) perturbations which can transfer across images and networks and are catastrophically misclassified.
Adversarial examples don't really support the claim that deep models are just memorizing examples. If they were, they wouldn't generalize to unseen examples at all. However, the human brain is also susceptible to adversarial examples (e.g. optical illusions). Yet human brains still generalize quite well. Likewise, deep learning can both suffer from adversarial examples and generalize well.

Generalization is a multi-axis scale, not a switch: you can have more or less generalization in many different dimensions. Being terrible at adversarial examples just means that axis is weak.

Generalization error is a number. You can debate the probability space over which it should be computed, but it exists in one dimension. Otherwise statements like "generalize well" make no sense. Anyways, I'm not aware of any optical illusion that can make the brain confuse a house cat with a door knob. Yet it is apparently possible to make any one of a number of neural nets do so, using the same technique and with only subtle, imperceptible changes to the input. So they cannot possibly be learning any sort of intrinsic representation of these objects. I'm guilty of being facetious in using the word "just", since of course deep nets (and all other serious ML algorithms I know of) are able to generalize to an extent. What's not clear to me this represents some paradigm shift in AI, as is often claimed, or if it's simply the consequence of fitting a hugely overparameterized function approximator to a web-scale amount of training data.
They say some weird stuff in this paper:

>"Specifically, we take a candidate architecture and train it both on the true data and on a copy of the data in which the true labels were replaced by random labels. In the second case, there is no longer any relationship between the instances and the class labels. As a result, learning is impossible."

This is like saying learning someones phone number is impossible because there is no relationship between the person and the number.

It's more like giving you a bag of random house numbers and instructing you to place the numbers on the correct houses in an area you've never been to before. An instructor teaches you where some of the numbers go, and you can memorize those examples, but when the instructor leaves you to finish the job on your own you have no way of knowing how to assign the remaining numbers.

Memorization is pretty easy. Generalizing from past examples requires that there be a relationship not just between one person and their phone number but between all people and their phone numbers.

Sure, that's an even better example. I guess, try as I might, I cannot grasp what people are finding interesting about this paper.
>'So it is not "understanding" anything intrinsic or fundamental about the letter "A".'

What is there to understand? As far as I know the shapes we use for letters are arbitrary (at least at this point).