Hacker News new | ask | show | jobs
by albert94 1110 days ago
The proposed idea here is different.

You can train several smaller models with different hyperparameters with dynamic budgets, i.e. bad configurations are trained for only few epochs, and good ones for more epochs. Once you find a good hyperparameter configuration for the small-scale model, then you train the large model with that configuration.

What is being shown is that the overhead of doing hyperparameter optimization at a small scale, is comparable to a single optimization at the largest scale.

Overall, the idea looks very cool.