This is interesting, thanks! Is there anything else you can tell me about the results of your experiments with small networks? I am really interested in this.
For example: did you notice than increasing or decreasing network size required significant changes in other hyperparameters? Are small networks learning faster at the beginning of training before they start to plateau?