Y
Hacker News
new
|
ask
|
show
|
jobs
by
gwern
2601 days ago
It's a constant, yes. We haven't tried any other learning rate schedules (for my poetry GPT-2s, I simply drop the LR 10x each day or so). I have no idea if this is optimal for transfer learning or not.