| HN Mirror

Oh, you touched my favorite topic of whole dataset training.

Take a look at [1] and go straight to the page 8, figure 2(b).

[1] http://proceedings.mlr.press/v48/taylor16.pdf

The paper talks about whole dataset training and one of the datasets used is HIGGS [2]. The figure 2(b) shows two whole dataset training approaches (L-BFGS and ADMM) vs SGD. SGD tops at the accuracy with which both whole dataset approaches start, basically.

[2] https://archive.ics.uci.edu/ml/datasets/HIGGS#

HIGGS is strange dataset. It is narrow, having only 29 features. It is also relatively long, about 11M samples (10M to train, 0.5M to validate and last 0.5M to test). It is also hard to get right with SGD.

But if you perform whole dataset optimization, even linear regression can get you good accuracy [3] (some experiments of mine).

[3] https://github.com/thesz/higgs-logistic-regression