Hacker News new | ask | show | jobs
by sanxiyn 3218 days ago
Nope, Transformer is still trained with gradient descent.