Hacker News new | ask | show | jobs
by vedant 2674 days ago
From the paper:

"All models still underfit WebText and held-out perplexity has as of yet improved given more training time."