| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by GardenLetter27 130 days ago
	Reinforcement Learning changes this though - remember Move 37? The issue is you need verifiable rewards for that (and a good environment set-up), and it's hard to get rewards that cover everything humans want (security, simplicity, performance, readability, etc.)