Y
Hacker News
new
|
ask
|
show
|
jobs
by
emp17344
20 days ago
RLVR doesn’t work for unverifiable tasks, so they won’t be able to effectively use tools to boost reliability for those tasks.
1 comments
jeremyjh
19 days ago
Right, so you have to use RLHF. That is the economics problem I was referring to.
link