Hacker News new | ask | show | jobs
by cubefox 53 days ago
Current models don't yet use RLVR with self-play though, at least as far as we know. They use RLVR with large numbers of manually created RL environments.

But they will probably use self-play soon. See https://www.amplifypartners.com/blog-posts/self-play-and-aut...