| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by munchler 457 days ago
	In the ML problem I'm working on now, there are about a dozen simple hyperparameters, and each training run takes hours or even days. I don't think there's any good way to search the space of hyperparameters without a deep understanding of the problem domain, and even then I'm often surprised when a minor config tweak yields better results (or fails to). Many of these hyperparameters affect performance directly and are very sensitive to hardware limits, so a bad value leads to an out-of-memory error in one direction or a runtime measured in years in the other. It's a real-world halting problem on steroids. This is not to even mention more complex design decisions, like the architecture of the model, which can't be captured in a simple hyperparameter.

5 comments

pama 456 days ago

Optuna often works fine in this context (even with the memory errors or, with some tuning, with the non-halting runs): https://github.com/optuna/optuna

link

lamename 457 days ago

You might find this helpful for prioritizing which knobs to turn first https://github.com/google-research/tuning_playbook

link

OccamsMirror 457 days ago

Starting to get a bit out of date. Pity they stopped updating it.

link

ayepif 456 days ago

Pity indeed! Do you have any suggested resources that are more up-to-date?

link

lamename 456 days ago

Last commit 1 yr. They welcome your contributions in CONTRIBUTING.md

link

brandonpelfrey 457 days ago

Are you already employing Bayesian optimization techniques? These are commonly used to explore spaces where evaluation is expensive.

link

riedel 457 days ago

They also depend on the design space to somewhat friendly in nature and can be modelled by a surrogate, so that exploit/explore can be modelled in an acquisition function.

Also successive halving e.g. build on assumptions how the learning curve develops.

Bottom line is that there is hyperparams for hyperparam searches again. So one starts building hyperparam heuristics on top of the hyperparam search.

In the end there is no free lunch. But if hyperparam search strategy somewhat works in a domain it is a great tool. Good thing is that one can typically encode the design space in Blackbox optimization algorithms more easily.

link

jampekka 456 days ago

I've been wondering how the training process of the huge models works in practice. If an optimization run costs millions, they probably don't just run a grid of hyperparameters.

link

yorwba 456 days ago

Run a grid of hyperparameters for small models of different sizes to find out how the optimal values change as you scale up (the "scaling laws"), then extrapolate to predict performance at even larger scales, then do a single large run and hope that your predictions aren't too far off.

link

logicchains 456 days ago

That's the advantage of deep learning over traditional ML: if you've got enough data, you don't need domain knowledge or hyperparameter tuning, just throw a large enough universal approximator at it. The challenge lies in generating good enough artificial data for domains without enough data, and getting deep models to perform competitively with simpler models.

link