| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by snyhlxde 768 days ago
	Thanks for interesting in our work! Yes we found training with consistency loss + AR loss on even a subset of a dataset results in significant speedup (0.01% pre-training cost). Training on more data permits even further speedup: the model is able to learn from more frequently-appearing collocations and phrases. For more details, please check out our paper and you can also see speedup saturates as the size of training data grows.