Hacker News new | ask | show | jobs
by a2128 292 days ago
From my personal experience training models this is only true when the parameter count is a limiting factor. When the model is past a certain size, it doesn't really lead to much improvement to use curriculum learning. I believe most research also applies it only to small models (e.g. Phi)