| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by DevelopingElk 240 days ago
	RL before LLMs can very much learn new behaviors. Take a look at AlphaGo for that. It can also learn to drive in simulated environments. RL in LLMs is not learning the same way, so it can't create it's own behaviors.