| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by craffel 2433 days ago
	It actually can be more pernicious than that: https://arxiv.org/abs/1802.08232 However note that the dataset used to train GPT-2 is about 20x smaller than C4. I'm not 100% sure how many times the training set was repeated over the course of training for GPT-2, but it was likely many times. I stand by my statement (that memorization is unlikely with SGD and no repetition of training data) but I would be happy to be proven otherwise.