Y
Hacker News
new
|
ask
|
show
|
jobs
No Train No Gain:Revisiting Efficient Training Algrthm for Transformer-BasedLM
(
arxiv.org
)
11 points
by
froster
1058 days ago
1 comments
froster
1058 days ago
Recent paper highlights the difficulty of creating a new optimizer as drop-in replacement. Sophia and Lion were recently proposed as superior alternatives to Adam, but appeared worse in an independent eval
link