Hacker News new | ask | show | jobs
by irodov_rg 2711 days ago
BERT is more computationally expensive. It might end up giving better results on the task mentioned in the paper but we don't know. At the time of writing this all of the contextual word embedding techniques were fairly new and were not tried.