Y
Hacker News
new
|
ask
|
show
|
jobs
by
cubefox
202 days ago
I assume they use self-verification only during RL training to provide the reward signal, but not for benchmarks.