Hacker News new | ask | show | jobs
by bydgjohc 467 days ago
Any hypotheses on why the performance dropped suddenly while training?
2 comments

Hi, other author here. I think the models converged on shallow/greedy strategies that improved performance up to a point, but are ultimately shortsighted, especially for harder puzzles.

Something interesting I noticed in the responses was that for shorter puzzles it would make deductions, building up a set additional "clues" for itself, before answering the question. However, for harder puzzles with more clues it would often merely repeat all the given clues and then try to directly answer the questions.

Maybe some form of curriculum learning would help, starting with easier puzzles and progressing to more challenging ones.

Other ideas to explore include:

- Distilling responses from stronger models - Encouraging exploration with entropy regularization or reward shaping - Training from base models instead of instruct models, like DeepSeek-R1-Zero

Is my understanding here correct? Could this be the reason?

https://news.ycombinator.com/item?id=43287312

As for why they dropped suddenly, I don't really know. Sometimes models develop degenerate behaviors, but even when forking from the best checkpoint and lowering the learning rate or changing other hyperparameters, performance stills drops. It's as if its fate has already been sealed many iterations ago.