|
|
|
|
|
by skeptic_69
2776 days ago
|
|
1.mmmmmmmmm ok I am willing to accept you meant the quadratic loss instead of 0-1 error. that seems reasonable. 2. this is paper is centered in a research thrust that IS focused on generalization. see my below comment. I don't know who most people are but this paper COULD be important in understanding why stochastic gradient works well in practice. Personally I doubt it very much. 3. massively overfitting to the training dataset BUT
generalizing well is a real phenomenon and yes it is very weird. happens in deep nets and i believe adaboost. i.e. continuing to train after you have zero 0-1 loss. I agree this is a weird way to communicate this idea but that is what the community uses. |
|