Y
Hacker News
new
|
ask
|
show
|
jobs
by
minch
1566 days ago
Thanks! The demo just shows the final agents after training (30K gradient updates). Interesting work re the reward maximizing curricula. I have not seen this before, so thanks for the pointer.