| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by applecrazy 2683 days ago
	You bring up a good point. Without seeing their code and training metrics, how do we know that this isn’t some extremely overfitted model?

1 comments

From the paper:

"All models still underfit WebText and held-out perplexity has as of yet improved given more training time."