Hacker News new | ask | show | jobs
by ekelsen 2672 days ago
I think the goal should be use the smallest dense network possible as the baseline. For MNIST, this might be a LeNet style convnet with [3, 9, 50] instead of the [20, 50, 500] network which is standard (and way overkill).

I haven't explored on CIFAR, but my guess is that using a more efficient architecture like mobilenetv2 would yield more likely to transfer results.

The general theme is that you should be using the smallest dense model you possibly can as a baseline.