| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by minch 1566 days ago
	Thanks! The demo just shows the final agents after training (30K gradient updates). Interesting work re the reward maximizing curricula. I have not seen this before, so thanks for the pointer.