Hacker News new | ask | show | jobs
by shoubidouwah 476 days ago
I wonder if there mioght not be an opportunity for a warmup based mask inversion: for the first few epoches, only apply the momentum agreeing with instantaneous - after that, invert it since the momentum would technically have more info?

In any case, good idea - reminds me of the "apply same gradient multiple times" trick from a few years ago. May have weird behaviours at low batch sizes though...