Y
Hacker News
new
|
ask
|
show
|
jobs
by
viraptor
198 days ago
In what way? Panel of experts approach has been a thing for a while now and it's documented to improve quality.
1 comments
gunalx
197 days ago
Well problematic because they are using their own verifier as apanem of experts, with their own model trained specifically to satisfy this verifier. On the benchmark runs, they dont mention using human experts to cross validate their scores.
link
cubefox
197 days ago
I assume they use self-verification only during RL training to provide the reward signal, but not for benchmarks.
link