|
|
|
|
|
by kastnerkyle
1033 days ago
|
|
What since Adam? Learning rate scales / schedules? I cannot think of many big massive changes since ~2014, most of the setups from that era (grad clip + medium-ish LR, some ramp up or roll-off at the end) work fine today for me. (Note: There are many, many great optimization papers since 2014 - I just don't see them show up in general recipes in open source too often) |
|