Hacker News new | ask | show | jobs
by applecrazy 2683 days ago
You bring up a good point. Without seeing their code and training metrics, how do we know that this isn’t some extremely overfitted model?
1 comments

From the paper:

"All models still underfit WebText and held-out perplexity has as of yet improved given more training time."