Y
Hacker News
new
|
ask
|
show
|
jobs
by
logicchains
773 days ago
So many papers play tricks with the learning rate schedule:
https://arxiv.org/abs/2307.06440