Hacker News new | ask | show | jobs
by krackers 301 days ago
Yup, RLVR as implemented by Deepseek et al. use only outcome supervision instead of process supervision. There have been attempts to do process supervision though.