|
|
|
|
|
by manux
3048 days ago
|
|
I wonder how documentable the space of hyperparameters really is (which is I think what the OP is poking at) with the current way we conceive of them, and also with how experiments currently happen. Often, people either reuse other people's architectures, or simply try 2 or 3 and stick with the best one, only changing the learning rate and such. I also wonder if there's a computation issue (training is long, we can only try so many things), or if it really is that we are working in the wrong hyperparameter space. Maybe there is another space we could be working in, where the HPs that we currently use (learning rate, L2 regularization, number of layers, etc.) are a projection from that other HP space where "things make more sense". |
|
[edit:] In this analogy, deep learning currently misses any sort of a general theory (in the sense of theories explaining experiments).