Hacker News new | ask | show | jobs
by eggie5 1949 days ago
You missed a big part: they did a big NAS run to make it work.
1 comments

Where did you see that they used NAS? Their preliminary results show it works even for the baseline model

they did a lot of manual hyperparameter optimization, and spend a fair amount of time unpacking the rationale for their choices, including a negative results section (!) in the appendix