Y
Hacker News
new
|
ask
|
show
|
jobs
by
bra-ket
3219 days ago
does it mean we don't need gradient descent after all to achieve the same result?
1 comments
sanxiyn
3219 days ago
Nope, Transformer is still trained with gradient descent.
link