Ah, I think I see what you mean: you are saying that because it's better than humans at predicting the next element in a sequence it's good at generalising. Is that correct, or am I misrepresenting your point?
Basically there are two approaches to sequence prediction.
The traditional style, linear regression, ARIMA, RNNs etc. where you directly predict the next element in a sequence. The output is on the same level of abstraction as the internal values used in the model.
There is also the new-ish style where you predict symbols instead of predicting the values directly. You can predict symbols representing numbers or you can also predict a symbolic formula that can be used to extrapolate the values perfectly. This is the way humans do it.
And my point is that when you look at the symbols embedding they do have interpretable structure that model can use to generalize. And experiments seems to suggest that DNNs models are indeed generalizing.
OK, thanks for the explanation. I think I understand what you mean. But this kind of generalisation takes very careful analysis to discern and I'm not convinced, yet. I'll be more easily convinced when I see something blatant, and n ≤ 10 is so far not there for me, even given the shift in what is predicted.
Basically there are two approaches to sequence prediction.
The traditional style, linear regression, ARIMA, RNNs etc. where you directly predict the next element in a sequence. The output is on the same level of abstraction as the internal values used in the model.
There is also the new-ish style where you predict symbols instead of predicting the values directly. You can predict symbols representing numbers or you can also predict a symbolic formula that can be used to extrapolate the values perfectly. This is the way humans do it.
And my point is that when you look at the symbols embedding they do have interpretable structure that model can use to generalize. And experiments seems to suggest that DNNs models are indeed generalizing.