Hacker News new | ask | show | jobs
by jdub 33 days ago
Reinforcement learning for "reasoning" perturbs the model to generate completions in a particular chain of thought / alternative selection structure. It's three next token predictors in a trench coat.
2 comments

When these things start solving many more long standing problems, and start introducing more novel problems, will people finally admit that the "next token predictor" is not the gotcha they think it is?
It's not a gotcha. It's incredible what these things can do despite being next token predictors from a weird dataset. That's at the heart of the "bitter lesson", and you don't have to believe in magic to see it.
> Some people like to parrot "next token prediction", "LLMs can only interpolate", and other nonsense

Thank you for illustrating my point.