|
|
|
|
|
by shoubidouwah
476 days ago
|
|
I wonder if there mioght not be an opportunity for a warmup based mask inversion: for the first few epoches, only apply the momentum agreeing with instantaneous - after that, invert it since the momentum would technically have more info? In any case, good idea - reminds me of the "apply same gradient multiple times" trick from a few years ago. May have weird behaviours at low batch sizes though... |
|