Hacker News new | ask | show | jobs
by minimaxir 1945 days ago
Overfitting on 17GB of input data would be interesting, even though it's using the "large" 774M GPT-2 model.

It's possible training for a month may be too much.