Hacker News new | ask | show | jobs
by eggie5 1954 days ago
Do what do attribute to the gains? Adaptive clipping? Or $$$ spent on NAS??
1 comments

They do a little study in section 4.1 comparing batchnorm to adaptive gradient clipping for resnets over a range of hyperparameters, and they also compare perf to batchnorm versions in table 6. The results indicate AGC does give a real boost over batchnorm

They do a bunch of manual hyperparameter tuning that seems necessary to get the state of the art results, from my reading it doesn’t seem like they actually used NAS. Just that the baseline they compare to was found with NAS