| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by beering 66 days ago
	My experience has been that this isn’t generally true, mainly because worse models pursue red herrings or get confused and stuck. a better model will get to the correct solution in fewer tokens, and my surface-level understanding of how RL works supports this.