Hacker News new | ask | show | jobs
by throw_away_777 3318 days ago
This statement: "Or in other words: the model, its size, hyperparameters, and the optimiser cannot explain the generalisation performance of state-of-the-art neural networks." is not true and very misleading. Careful selection of hyperparameters and the model can clearly improve generalization - the article is making a mistake in assuming that getting to zero training error is a good thing or a desirable thing. In fact a large part of hyperparameter optimization are choices that ensure generalization, and some of the fundamental choices such as early stopping and many others do determine how well the model generalizes. If your model has zero training error you have likely made poor choices.
2 comments

Where does the article state that zero training error is a good thing? The authors only show that almost every modern neural network can reach 0 training error, even if the labels are randomized (generalization impossible). Hence, they can learn the dataset by hearth. The authors can, from that, use the testing error as a generalization indicator.

Indeed a careful hyperparameter choice is the only key now to have good generalization. As I understood it, the goal here is more to show that the correlation between the regularization of the network and its generalization power is far from being clear as it is for other ML algorithms like SVM.

In short, NN hyperparameters help to reach generalization, but cannot "explain" it. It's the key difference here between practice and theory.

Then what about the following sentence?

This must be the case because the generalisation performance can vary significantly while they all remain unchanged.

Maybe it was just me, but I read an implied "alone" in the sentence you quoted, ie:

"Or in other words: the model, its size, hyperparameters, and the optimiser, alone, cannot explain the generalisation performance of state-of-the-art neural networks."