Hacker News new | ask | show | jobs
by yazr 2707 days ago
Is BERT computationally (and sample-wise) equivalent to previous SOTA?

(I do DRL but not NLP)

I sometimes read these DL papers and the requirements are not really feasible if you have to re-implement them in a modified domain.

1 comments

BERT is more computationally expensive. It might end up giving better results on the task mentioned in the paper but we don't know. At the time of writing this all of the contextual word embedding techniques were fairly new and were not tried.