Hacker News new | ask | show | jobs
by kyle_grove 2420 days ago
In fact, one of the chief advantages of the BERT/Transformer architecture over ELMO/LSTM is the ability to parallelize.