| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pegasus 141 days ago
	Look into RLVR (Reinforcement Learning with Verifiable Rewards). It happens during model post-training.

1 comments

That’s not happening in run time, when you send the prompts for the next token to be generated. The comment seems to imply a run time phenomenon.