Hacker News new | ask | show | jobs
by uh_uh 1141 days ago
Previous approaches like LSTM struggled learning long-term dependencies. The transformer improved on this greatly.