| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by yazr 2754 days ago

Is BERT computationally (and sample-wise) equivalent to previous SOTA?

(I do DRL but not NLP)

I sometimes read these DL papers and the requirements are not really feasible if you have to re-implement them in a modified domain.

1 comments

irodov_rg 2754 days ago

BERT is more computationally expensive. It might end up giving better results on the task mentioned in the paper but we don't know. At the time of writing this all of the contextual word embedding techniques were fairly new and were not tried.

link