Hacker News new | ask | show | jobs
by skeptic_69 2776 days ago
1.mmmmmmmmm ok I am willing to accept you meant the quadratic loss instead of 0-1 error. that seems reasonable.

2. this is paper is centered in a research thrust that IS focused on generalization. see my below comment.

I don't know who most people are but this paper COULD be important in understanding why stochastic gradient works well in practice.

Personally I doubt it very much.

3. massively overfitting to the training dataset BUT generalizing well is a real phenomenon and yes it is very weird. happens in deep nets and i believe adaboost. i.e. continuing to train after you have zero 0-1 loss. I agree this is a weird way to communicate this idea but that is what the community uses.