| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by magicalhippo 110 days ago

Maybe some newer references are better, but my mind went to the Model Soups paper[1]:

The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low error basin. We show that averaging the weights of multiple models fine-tuned with different hyperparameter configurations often improves accuracy and robustness. Unlike a conventional ensemble, we may average many models without incurring any additional inference or memory costs -- we call the results "model soups."

[1]: https://arxiv.org/abs/2203.05482