No Train No Gain:Revisiting Efficient Training Algrthm for Transformer-BasedLM

Y	Hacker News new \| ask \| show \| jobs

	No Train No Gain:Revisiting Efficient Training Algrthm for Transformer-BasedLM (arxiv.org)
	11 points by froster 1103 days ago

1 comments

froster 1103 days ago

Recent paper highlights the difficulty of creating a new optimizer as drop-in replacement. Sophia and Lion were recently proposed as superior alternatives to Adam, but appeared worse in an independent eval

link