GSD, also known in the literature as "Graduate Student Descent."
I'm not even joking. Trial and error. Having good "intuition" about past ideas the basic building blocks to guide that trial and error. Reading research papers and seeing what other people did well with and using that.
As an aside, this is the principal reason I am skeptical of grandiose claims about deep learning.
Regularisation methods like dropout are often good enough that you can build a network with too many parameters (for the amount of data you have) and rely upon the regularisation to find the subset of that network that is actually useful. People have recently got good results from also randomly dropping weights, or even whole layers.
Probably also through some grid search.
I've read (but not rememeber where) that Random Search gives very good results, even better than grid (in less time).
I'm not even joking. Trial and error. Having good "intuition" about past ideas the basic building blocks to guide that trial and error. Reading research papers and seeing what other people did well with and using that.
As an aside, this is the principal reason I am skeptical of grandiose claims about deep learning.