| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jdub 33 days ago
	Reinforcement learning for "reasoning" perturbs the model to generate completions in a particular chain of thought / alternative selection structure. It's three next token predictors in a trench coat.

2 comments

munksbeer 33 days ago

When these things start solving many more long standing problems, and start introducing more novel problems, will people finally admit that the "next token predictor" is not the gotcha they think it is?

link

jdub 33 days ago

It's not a gotcha. It's incredible what these things can do despite being next token predictors from a weird dataset. That's at the heart of the "bitter lesson", and you don't have to believe in magic to see it.

link

charleshn 33 days ago

> Some people like to parrot "next token prediction", "LLMs can only interpolate", and other nonsense

Thank you for illustrating my point.

link