| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chaos_emergent 316 days ago
	> This is likely because LLMs are typically trained for only a single epoch over massive datasets, which is in contrast to the multi-hundred-epoch training regimes for which dropout was first introduced. Wait, is this true? That seems like a wild statement to make, relatively unsubstantiated?

1 comments

No this is well known. Look for Table 2.2 in GPT3 paper.

Thank you, that was a wild thing to learn!