|
|
|
|
|
by irodov_rg
2711 days ago
|
|
BERT is more computationally expensive. It might end up giving better results on the task mentioned in the paper but we don't know.
At the time of writing this all of the contextual word embedding techniques were fairly new and were not tried. |
|