| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kastnerkyle 1080 days ago
	What since Adam? Learning rate scales / schedules? I cannot think of many big massive changes since ~2014, most of the setups from that era (grad clip + medium-ish LR, some ramp up or roll-off at the end) work fine today for me. (Note: There are many, many great optimization papers since 2014 - I just don't see them show up in general recipes in open source too often)