Y
Hacker News
new
|
ask
|
show
|
jobs
by
minimaxir
1945 days ago
Overfitting on 17GB of input data would be interesting, even though it's using the "large" 774M GPT-2 model.
It's possible training for a month may be too much.