Y
Hacker News
new
|
ask
|
show
|
jobs
by
pegasus
141 days ago
Look into RLVR (Reinforcement Learning with Verifiable Rewards). It happens during model post-training.
1 comments
geraneum
138 days ago
That’s not happening in run time, when you send the prompts for the next token to be generated. The comment seems to imply a run time phenomenon.
link