Hacker News new | ask | show | jobs
by emp17344 20 days ago
RLVR doesn’t work for unverifiable tasks, so they won’t be able to effectively use tools to boost reliability for those tasks.
1 comments

Right, so you have to use RLHF. That is the economics problem I was referring to.