Hacker News new | ask | show | jobs
by hashta 38 days ago
I think I trained models with #params >> #training examples for hundreds of epochs, but still don't recall seeing that loss curve on real data. Curious if others have seen it with larger models or much longer runs