| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by InvidFlower 468 days ago
	Well, not sure if that part matters as much (from first principles). But the more important part being that RL lets a model figure out which methods are effective for it. Most of the time it probably has the tools already from pre-training, but doesn't "make the connection" to use them (or at least not often enough).