Hacker News new | ask | show | jobs
by pama 1165 days ago
I’m working in a related area and I’m rather curious about this point. In what way is GPT-4 overfit? Does overfit in this context mean the conventional: validation loss went up with additional training, or something special?
1 comments

More specifically validation loss is irrelevant when you can't even sample out of distribution anymore.