| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by HarHarVeryFunny 2 hours ago
	This is nothing new - these companies don't want their model's output to be useful for distillation/training, so they just give a "summary" of its thinking steps rather than the actual sequence. RL (the basis of LLM "thinking") is a pretty crude way to achieve the appearance of reasoning given that it reinforces all the steps, including missteps, that got it to a reward. Providing a summary could be seen as form of sane-washing, making the model look more purposeful and directed than it really is!