|
|
|
|
|
by hooloovoo_zoo
2777 days ago
|
|
This is an intriguingly aggressive comment. 1. No, it's impossible. Actually, the theorems in this paper do not claim to reach zero loss either, as they're all inequalities on the size of the loss. The paper you cite refers to converging to zero loss, as do you in point 2. Perhaps you're referring to error, which is not the loss that is directly optimized. 2. This paper certainly isn't talking about generalization. It doesn't appear to be mentioned once. Your other paper is talking about generalization. The parent asked if this paper is super important. I gave a reason why it isn't super important for most people. 3. Massively overfitting is antithetical to generalizing. Overfitting means fitting to the extent that you're generalizing less well. |
|
2. this is paper is centered in a research thrust that IS focused on generalization. see my below comment.
I don't know who most people are but this paper COULD be important in understanding why stochastic gradient works well in practice.
Personally I doubt it very much.
3. massively overfitting to the training dataset BUT generalizing well is a real phenomenon and yes it is very weird. happens in deep nets and i believe adaboost. i.e. continuing to train after you have zero 0-1 loss. I agree this is a weird way to communicate this idea but that is what the community uses.