Hacker News new | ask | show | jobs
by nil-sec 2053 days ago
Agreed, I trained a 3D version of b0-b2 on a classification task I worked on and besides being very slow to train they did not outperform a simple baseline VGG architecture. Interestingly training time was much improved by setting the cudnn benchmark flags in pytorch. Haven’t seen this reduction in training time for any other architecture but for b2 it went from about 12 sec per iteration to about 0.8. I guess there is more margin optimizing NAS models.