Y
Hacker News
new
|
ask
|
show
|
jobs
by
sanxiyn
3218 days ago
Nope, Transformer is still trained with gradient descent.