| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sepiasaucer 1612 days ago

“One can, in theory, start with lots of architectures, then optimize each one and pick the best. “But training [takes] a pretty nontrivial amount of time,” said Mengye Ren, now a visiting researcher at Google Brain. It’d be impossible to train and test every candidate network architecture. “[It doesn’t] scale very well, especially if you consider millions of possible designs.””

——-

If you wanted to find best architecture in order to maximize accuracy, why not just train a model to predict accuracy (not parameters) given architecture and then optimize over the model?

This seems similar to optimizing any expensive black box function. Fit a cheap approximation (i.e., surrogate model) and then optimize over cheap model.