Hacker News new | ask | show | jobs
by kastnerkyle 1033 days ago
What since Adam? Learning rate scales / schedules? I cannot think of many big massive changes since ~2014, most of the setups from that era (grad clip + medium-ish LR, some ramp up or roll-off at the end) work fine today for me.

(Note: There are many, many great optimization papers since 2014 - I just don't see them show up in general recipes in open source too often)