| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gwern 2601 days ago
	It's a constant, yes. We haven't tried any other learning rate schedules (for my poetry GPT-2s, I simply drop the LR 10x each day or so). I have no idea if this is optimal for transfer learning or not.