Hacker News new | ask | show | jobs
by minch 1566 days ago
Thanks! The demo just shows the final agents after training (30K gradient updates). Interesting work re the reward maximizing curricula. I have not seen this before, so thanks for the pointer.