|
|
|
|
|
by octbash
2314 days ago
|
|
One is a language generation model, the other is a fill-in-the-blank model. It sounds like they might be similar, but in practice they are different enough objectives (and in particular the "bi-directional" aspect of BERT-type models) that the models learn different things. |
|