Hacker News new | ask | show | jobs
by bradhilton 467 days ago
As for why they dropped suddenly, I don't really know. Sometimes models develop degenerate behaviors, but even when forking from the best checkpoint and lowering the learning rate or changing other hyperparameters, performance stills drops. It's as if its fate has already been sealed many iterations ago.