Y
Hacker News
new
|
ask
|
show
|
jobs
by
krackers
301 days ago
Yup, RLVR as implemented by Deepseek et al. use only outcome supervision instead of process supervision. There have been attempts to do process supervision though.