|
|
|
|
|
by sepiasaucer
1612 days ago
|
|
“One can, in theory, start with lots of architectures, then optimize each one and pick the best. “But training [takes] a pretty nontrivial amount of time,” said Mengye Ren, now a visiting researcher at Google Brain. It’d be impossible to train and test every candidate network architecture. “[It doesn’t] scale very well, especially if you consider millions of possible designs.”” ——- If you wanted to find best architecture in order to maximize accuracy, why not just train a model to predict accuracy (not parameters) given architecture and then optimize over the model? This seems similar to optimizing any expensive black box function. Fit a cheap approximation (i.e., surrogate model) and then optimize over cheap model. |
|