| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ssivark 902 days ago
	How crucial is it to freeze the learning rate schedule a priori, instead of tweaking it on the fly?

2 comments

minimaxir 902 days ago

Constant learning rates were the default in older ML implementations, but linear decay became an obvious optimization, and now we have both warmup and cosine decay to handle common training patterns, especially with the AdamW optimizer.

If the learning rate is too high at a given point in training, it can result in either a) the model stopping learning or b) exploding gradients, which is very bad.

link

grandma_tea 902 days ago

Adaptive learning rate is a thing. For example, one scheme I've used before is to decrease the learning rate if the validation loss stops decreasing.

It's not clear to me if this is applicable to LLMs though.

link