|
|
|
|
|
by snyhlxde
768 days ago
|
|
Thanks for interesting in our work! Yes we found training with consistency loss + AR loss on even a subset of a dataset results in significant speedup (0.01% pre-training cost). Training on more data permits even further speedup: the model is able to learn from more frequently-appearing collocations and phrases. For more details, please check out our paper and you can also see speedup saturates as the size of training data grows. |
|